It's spoken in Hongkong and it's surrounding places. Please add it as a separate language. Although Mandarin and Cantonese are both Chinese, the sounds are very different to the point where people cannot understand the other language. Also, mandarin uses simplified written Chinese and Cantonese uses traditional written Chinese though the two are usually mutually intelligible.

I agree completely, it's getting really annoying when I see requests for Chinese-->English translations and the lyrics turn out to be in Cantonese when I research them further

Hi, there. Thanks for the topic. Mandarin and Cantonese were given standalone entries in the past, but it was reverted back to one single entry for Chinese because there was no way we could check all the lyrics and translations already marked as Chinese as to edit them and mark them in the proper dialect.

I agree, but how would users who don't what's in Mandarin and what's in Cantonese when they submit songs or make requests?

We currently have 2,309:

From these number, how many of them are in Cantonese and Mandarin? It would be awesome if we could separate them, but unless someone (or several someones) are willing to help, I don't think the admins would really take this into account Sad smile

Or is there a way you could teach us how to differentiate between them? how many are there out there that you believe might be on this site? I'm really interested in helping this get through! I hope I don't sound sarcastic when I say these things.

Sarcasm is not really your style, Helen. Leave that to me Wink smile

Now seriously, automatic language identification is technically feasible even with modest resources.

Extremely simple neural networks can do that (it's way less complicated than recognizing kitten), or various classification / statistical approaches. One of the minor blessings of this "deep learning" every man and his dog seems so excited about these days.

You can even pray our benevolent Google God to do that kind of job, from within an automated piece of software.

The results are not guaranteed (it can fail on very short texts or very similar languages like Brazilian/Portuguese), but apparently Chinese and Cantonese are not listed as "dangerously close".

That could reduce the tedious browsing through a couple thousand lyrics to a handful of indecidable cases to be checked by hand.

I have never tested any of these tools, but I have no doubt about the technology.
I just think it might be worth a look from the powers that be.

@petit élève Regular smile then I must work on it!

I have looked into something like GT and I have looked at small samples of the songs we already have here. I'm going to do some testing tonight by taking a batch of songs on LT and seeing if I can correctly identify them through these resources or through other means. If results are good, I think we might be on to something. I just need to find a native speaker who can identify them afterward to see if there was any success in it.

How correct was I though? any native speakers?



