📌 Identifying a language

53 publications / 0 nouveau(x)
Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015
Pending moderation

From time to time you face the problem of identifying of the language of the text. Especially if the text is written with unknown letters.
I would like to provide you a hint, how to recognize the Ukrainian language among the other Slavic languages without using a GoogleTranslate, that lies too often... ;)

1) Ukrainian language uses a Cyrillic alphabet. https://en.wikipedia.org/wiki/Cyrillic_script

2) Ukrainian language uses a few unique letters:

Ї / ї , Є / є , Ґ /ґ

so, if the text written in Cyrillic script contains such symbols - it is Ukrainian for sure (or at least the text is written partly in Ukrainian.)

3) Ukrainian language doesn't use these Cyrillic letters:

Ы / ы , Э / э, Ё / ё, Ъ / ъ, Њ, Ђ, Ў, Љ, Џ

so, if the text written in Cyrillic script contains such symbols - it is definitely not Ukrainian (or it is Ukrainian, but mixed with another language :) - check the hint #2 ;) )

Which unique features has your language. How can a foreigner identify your language among the similar ones?

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011

For German it's quite easy: If there is an ß (an s-z), then it must be German. However, not all German texts use that letter. In Switzerland they even don't use it at all, they always write ss instead.

So, what else? If you find ä, ö or ü, and no other special characters, then that's also a fairly good sign. There however are also quite some other languages that use those characters, e.g. Turkish or Swedish (the latter only ä and ö).

But the most obvious feature is the capitalization. If a lot of words amidst a sentence that don't look like names begin with a capital letter, then it's very very likely that it's German.
Online, many also write German in all lower-case, though, and the rules stated also don't necessarily apply for dialects as there's no official orthographical system for those.

Modérateur à la retraite of the Balkans :)
<a href="/fr/translator/cherrycrush" class="userpopupinfo" rel="user1144880">CherryCrush </a>
Inscrit·e le : 07.12.2012

The Bulgarian is quite a tricky one, especially for the people who don't use the Cyrillic alphabet often. Useful link for anyone, who's not familiar with the letters - https://en.wikipedia.org/wiki/Bulgarian_alphabet

1). We use only the Cyrillic alphabet, we don't have letters from the Latin alphabet, except the ones that are the same for both scripts, like A, O, E etc. (if there are J's - that's Macedonian, not Bulgarian)

2). We don't have letters, such as ы or э - that's Russian, not Bulgarian.

3). We use the letter ь ALWAYS in a combination with o as "ьо" (and it's not used really often), and never at the end/beginning of a word - so if you see it on it's own, that's maybe Russian again.

4) We don't have any diaereses (ë is Russian) or any types of accents (like à, á, â) - in a really, really rare cases there can be a stressed è or ò, but most of the people don't write it, it's only used when a word can be mistaken for one that has a different stress (example - ужàсен/ужа`сен and ужасèн/ужасе`н - the first one means terrible, and the second - terrified). It is used mainly in books and it's really really rare - the sentence has to be vague to need an extra accent. Plus, if it's for recognizing a song's language, most of the people who write lyrics online are not that good ar Bulgarian writing.
The only exception is actually ѝ on its own (dative case of the 3 person singular feminine personal pronoun, short form, something which may be translated as "of hers", but still it depends on the context), but nowadays people usually write it as й on its own (to be distinguished from the normal и, which means and).

5). We don't use any of these special characters - Њ, Ђ, Ў, Љ, Џ (most of them are Serbian, if I'm not mistaken), or the Ukrainian ones Ї / ї , Є / є , Ґ /ґ (credit goes to Alex here).

6). We actually use the letter ъ/Ъ. A lot. Way too many words have it, so if you don't have any of the already mentioned "special characters" and you see lots of ъ's, then most probably it's Bulgarian.

That's what I can think of now, if I remember something else that's specific, I'll mention it :)

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015
CherryCrush wrote:

... we don't have any letters that are the same as in the Latin alphabet...

Actually, you do have a lot letters of such kind: A, E, C, M, O, T, X - all the same is in the Latin alphabet ;) :lol:

Thaks for your reply! :) It is very useful for me, as for a person who studied Bulgarian.

Modérateur à la retraite of the Balkans :)
<a href="/fr/translator/cherrycrush" class="userpopupinfo" rel="user1144880">CherryCrush </a>
Inscrit·e le : 07.12.2012
Alexander Laskavtsev wrote:
CherryCrush wrote:

... we don't have any letters that are the same as in the Latin alphabet...

Actually, you do have a lot letters of such kind: A, E, C, M, O, T, X - all the same is in the Latin alphabet ;) :lol:

They actually exist as letters in the Cyrillic alphabet, what I meant we don't have extra ones, except the ones that exist in both scripts :D (like the J example)

Modérateur 👨🏻‍🏫🇧🇷✍🏻👨🏻
<a href="/fr/translator/don-juan" class="userpopupinfo" rel="user1110108">Don Juan <div class="moderator_icon" title="Moderator" ></div></a>
Inscrit·e le : 05.04.2012

As far as I know, Portuguese has no distinguishing features when it comes to written language. We do have the ç / Ç variant of the C letter, but it's also found in French and other languages - though, if it comes before a, o, or u it's very likely to be Portuguese (examples: direção/direction, taça/cup, braço/arm, caçador/hunter, peça/play, precaução/precaution, prevenção/prevention, Suíça/Switzerland). You can also find it like -ça, -ço, -çu, -ção or -ções.

But something that may come in handy to recognize texts in Portuguese is to know about where to put a pronoun. I believe we're the sole Romance language that would use something like 'Dar-te-ei' (I'll give thee -> very formal).

This Wikipedia article may also help.

Oh, and we also have diacritics, but they make no big difference as they also appear in several other languages. In Portuguese we have: the cedilha (ç), acute accent (á, é, í, ó, ú), circumflex accent (â, ê, ô), tilde (ã, õ), and grave accent (à).

Modérateur 👨🏻‍🏫🇧🇷✍🏻👨🏻
<a href="/fr/translator/don-juan" class="userpopupinfo" rel="user1110108">Don Juan <div class="moderator_icon" title="Moderator" ></div></a>
Inscrit·e le : 05.04.2012

This is a very interesting topic that may be used as a means of help, so I marked it as 'sticky'. It won't disappear when more people comment here as time passes by ;)

Débutant(e)
<a href="/fr/translator/chris-gogh" class="userpopupinfo" rel="user1269513">Chris Gogh </a>
Inscrit·e le : 11.12.2015

Finnish has something that is called "the vowel harmony", which states that there can't be both back (a, o, u) and front (ä, ö, y) vowels in the same word. Hungarian has a similar feature with different letters.

For example there is word "käsittämätön", which has only front vowels, and "saamaton", which has only back vowels. They have also different case forms for this reason: "käsittämätönTÄ" and "saamatonTA".

Only pitfall in this is that compound words don't follow this rule: palkkatyö has both a and y, because it is made from two parts: palkka and työ, which both follow the rule.

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011

Turkish, for example, has a very similar vocal harmony, too. The difference being that it additionally also has the letter ı (also a back vowel) and that e is not neutral but front (the distinction there is actually not back and front, but rounded lips and unrounded lips).
But Finnish can be easily recognized by its huge amount of vowels per word.

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015

Also for Finnish is typical:
- very long words;
- frequent doubling of the consonants;
- frequent situation is when more than two vowels go in a row

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

नमस्ते That's the language to you! Hindi! Up to more than 70 letters and characters altogether! A linguist's delight! 8)

Modérateur à la retraite with wings
<a href="/fr/translator/lylaphoenix" class="userpopupinfo" rel="user1238265">lylaphoenix </a>
Inscrit·e le : 07.03.2015

In Italian most of the words finish with a vowel and if there's an accent it is (usually) on the last letter of a word.

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011
sandring wrote:

नमस्ते That's the language to you! Hindi! Up to more than 70 letters and characters altogether! A linguist's delight! 8)

Actually, when I see that I first assume it to be Sanskrit - at least regarding some of the lyrics of "unknown" language I found here as those mostly are of religious topics. (and the word your quoted would be Sanskrit, as well).

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

It's just Hello in Hindi. But Sanskrit writing gave rise to Devanagari (Hindi ABC) and many letters are the same. So Sanskrit also looks familiar to me.

Modérateur 🔮​🇧​​🇮​​🇩​​🇽​​🇦​​🇦​❜
<a href="/fr/translator/citl%C4%81licue" class="userpopupinfo" rel="user1109697">citlālicue <div class="moderator_icon" title="Модератор" ></div></a>
Inscrit·e le : 31.03.2012

I'm not entirely sure with Spanish (https://en.wikipedia.org/wiki/Spanish_orthography): we have "ñ/Ñ" or "ü/Ü" and that's about it, even the accents are similar to other romance languages (áéíóú).

Modérateur der Fragenfinder
<a href="/fr/translator/questionfinder" class="userpopupinfo" rel="user1220274">Questionfinder <div class="moderator_icon" title="Модератор" ></div></a>
Inscrit·e le : 16.09.2014

Maybe English because it has a lot of apostrophes. I'm, don't, won't, isn't, ain't, can't, hadn't, wouldn't, couldn't, shouldn't, you're, they're, we're, she's, he's, and others. Plus the possessive case uses an apostrophe. And the word "I" appearing a lot. Also because I understand it. That's a good hint. also "ph" and "sh" at the beginning and ending of words. Maybe that's unique? the "ough" combo as well is probably unique to English.

Plus the fact that i can understand it. That's helpful, right?

Modérateur à la retraite
<a href="/fr/translator/ivank23" class="userpopupinfo" rel="user1102192">ivank23 </a>
Inscrit·e le : 11.01.2012

1)The Macedonian Cyrillic alphabet is most easily differentiated from the other Cyrillic versions by its three unique letters: Ќќ, Ѓѓ, Ѕѕ. If you see these three letters, you can be definitely sure it's Macedonian. Every other letter is present in other Cyrillic alphabet( s ).

2) The usage of the grave accent in the words /сè/ and /нè/ is a unique characteristic of the Macedonian orthographic rules.

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015
ivank23 wrote:

1)...
2)...

Здравей! :)

You've forgotten the most obvious feature of the Macedonian.

As CherryCrush has already mentioned:

3) Macedonian language uses additional non-cyrillic letter J ;)

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

I believe, English is easy to identify by "I" looming here and there and proudly looking down on the other letters. :)

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015
sandring wrote:

I believe, English is easy to identify by "I" looming here and there and proudly looking down on the other letters. :)

Из старого анекдота:

"...Вы слишком долго жили в Америке, если пишите Я с большой буквы :)

====================================================
An old Russian joke:

You've been living in the US for too long, if you write "Я" ("I/me") with a capital letter.

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

But we're talking about published texts, I guess. Besides, what do you want from an English teacher? ;)

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015
sandring wrote:

But we're talking about published texts, I guess. Besides, what do you want from an English teacher? ;)

Well, I'm waiting for someone who will tell about the Russian language. :lol: (Otherwise I will have to do this... :) )

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

You mean the people? :~

Alex: I mean the language (corrected ;))

Gourou Father
<a href="/fr/translator/steve-repa" class="userpopupinfo" rel="user1266417">Steve Repa </a>
Inscrit·e le : 16.11.2015

from a simple english speaker, you guys impress me!

Super membre
<a href="/fr/translator/nyangoro" class="userpopupinfo" rel="user1276016">Nyangoro </a>
Inscrit·e le : 02.02.2016

English also has a vast array of words that look like they shouldn't be in the same language, though that might be a little more difficult to identify for most.

I mean, English is a language where both "beauty" and "wonder" can exist, and it's not weird.

Gourou Father
<a href="/fr/translator/steve-repa" class="userpopupinfo" rel="user1266417">Steve Repa </a>
Inscrit·e le : 16.11.2015

...it is a language that is 'pliable', with Latin, German, French, Greek, Spanish, and some Indian as well as Dutch, and American Indian words. Oh chinese as well from the last centuary. Sometimes it's complicated to look into the dictionary because you have to know how to spell it to find it! In America, the US, it won out by one vote to be the national language - or we would be speaking German here today.

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

English has the biggest vocabulary of all! About 1 million entries in dictionaries. Historically one and the same word might have been borrowed from different languages that's why we have beautiful-beauteous, regal-royal-king's-queen's etc. It has all mounted up to one million words over a period of time. Thank you, Steve, it's the first time I've heard about that language vote. I'll look it up. :)

Gourou Father
<a href="/fr/translator/steve-repa" class="userpopupinfo" rel="user1266417">Steve Repa </a>
Inscrit·e le : 16.11.2015
Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015

Sandring and Steeve Repa, the things you are discussing are really interesting, but you are generating offtop now. I would like you to speak on the topic of the thread.

Modérateur à la retraite
<a href="/fr/translator/voldimeris" class="userpopupinfo" rel="user1243895">Voldimeris </a>
Inscrit·e le : 25.04.2015

Ok, I'll try to explain you if your text is in Romanian.
Of course, it uses Latin alphabet but with additional letters - ă î ţ ş â. So, if you have a lot of such letters in your text, it's definitely Romanian.
Warning!
1. Don't mix Romanian with Aromanian, Romanian doesn't have the letter "ã" and a lot of apostrophes!
2. There is a problem because sometimes texts are in Romanian but they don't contain the above letters. On the Internet, it's very popular to write in such a way. So there are my suggestions that could help you to determine whether your text in Romanian or not:
First, if there are a lot of words with endings "ul" or "ului" (scaunUL, scaunULUI), this is Romanian.
Secondly, if you see the clusters of letters such as mi-e, ti-e, le-am, m-ai and so on, this is Romanian.
I hope my help will be useful)

Maître
<a href="/fr/translator/sandring" class="userpopupinfo" rel="user1263066">sandring </a>
Inscrit·e le : 18.10.2015

हीरे मोती मैं ना चाहूँ Hindi is easily recognisable as one of the Indian languages because each word must be written under a line with spacing between the words. There are 55 letters in the Devanagari ABC plus some extra characters. The sentences are divided by a detached vertical line. There are no capital letters. The problem is that there are other Indian languages that use that writing-under-the-line principle like Bengali, Sanskrit and others. But at list it can help to narrow down the area of search to India.

Modérateur / hippie-abraça-árvore
<a href="/fr/translator/maluca" class="userpopupinfo" rel="user1206376">maluca <div class="moderator_icon" title="Модератор" ></div></a>
Inscrit·e le : 30.04.2014

Portuguese is recognizable because many words end with the suffix "-ção" (plural: "-ções")
In general the combination "ão" is used very often as the ending of a word.

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015

Well, what about the Russian language? It's the most tricky, because some non-Slavic nations use the script, based on Russian Cyrillic alphabet. But I'll try to explain.

1) The Russian, as well as the Ukrainian language uses a Cyrillic alphabet. https://en.wikipedia.org/wiki/Cyrillic_script

2) The letters, that are typical for the Russian texts, and as a rule they are markers for Russian language:

Ы/ы, Э/э, Ё/ё

Warning! Belarussian, and some other alphabets have the same letters, so it's easy to get confused.
So, it's much easier to define the markers of non-Russian text. :)

3) Usage of the apostrophe is not typical for the Russian language (unlike the Ukrainian or Belorussian). It can be used very rarely for transcribing of foreign names in places, when it's need to show a glottal stop: Скарлет О'Хара. But your chances to encounter with such cases in Russian are too low.

4) I/і

There's no such letter in the modern Russian language. An absence of such letter is the best marker, that distinguishes the Russian language from the closest Belorussian or Ukrainian.

5) Also it's quite hard to distinguish the Russian text from the Bulgarian one. But, you should remember that the Bulgarian language never uses Ы/ы, Э/э, Ё/ё. Also Bulgarian uses the letter "Ъ" much more often then Russian language.

6) Finally, the Russian language doesn't use the diacritical signs (the only exceptions are umlaut above the "Ё" and breve above "Й"). Any other diacritics in Cyrillic texts are signs of non-Russian language.

Gourou
<a href="/fr/translator/septembrologie" class="userpopupinfo" rel="user1217083">Septembrologie </a>
Inscrit·e le : 15.08.2014

In turkish we have Ğ,ı,İ letters (we also have Ç,Ş,Ü,Ö but there are other languages which have this letters, too ^^ ) and we haven't Q,W,X. Also we have no words contain -ck,-sh,-ch or etc. and we have no words end with B,C,D,G so I think it's pretty easy to identify turkish when you see ^^

Modérateur à la retraite of the Balkans :)
<a href="/fr/translator/cherrycrush" class="userpopupinfo" rel="user1144880">CherryCrush </a>
Inscrit·e le : 07.12.2012

I decided to add something about Albanian, since I've seen it mixed with so many other languages. I always classify it under the category "has nothing in common with anything else" though... a lot easier task than trying to recognize the Slavic languages, I guess.

1). It uses the Roman alphabet; some specific characters/combination of letters that make it recognizable - ë, ç (no any other accents of any type), dh, gj, xh, zh, nj, rr, ll.
It more often than other languages in my opinion uses single q's, y, x and j (that's a really popular one).

2) Albanian uses ë a lot. However, this is not a really good indicator over the internet, since some people omit it (or rarely write it as normal e, because the normal e often changes the meaning of the words, like it changes some nouns from singular to plural, for example), so if you see many hard to pronounce consonants next to each other, it may be Albanian with omitted ë's (especially if you spot some of the letters I mentioned above too).

edit: and the Albanian doesn't use the letter W at all.

Modérateur ɹoʇɐɹǝpoW
<a href="/fr/translator/besatnias" class="userpopupinfo" rel="user1120051">Besatnias <div class="moderator_icon" title="Модератор" ></div></a>
Inscrit·e le : 28.07.2012

How do non-Spanish speakers recognize Spanish? What do you think are the special characteristics apart from ñ?

I tell Chinese and Japanese apart through Hiragana (たとえばキ), because apart from the weird symbols in both languages, Japanese fills in all grammar with Hiragana. So Chinese looks a lot more filled up (悪召使鏡音星屑).

In Esperanto, there are these consonants ŝĵĉĝĥ, and this vowel ŭ, and words tend to end with n, j, jn and apostrophes, like: mi amas virinojn kaj mian aĝ'.

Portuguese has "ão" and "ç" and I is eu. I think that's enough, lol. For example. Ção in Portuguese is "tion" in English, so "nation" becomes "nação", "adoration" becomes "adoração", etc.

French has lots of é, ê, eu, ai, ou, ç. Also, it has the vowel œ; very characteristic. And lots of l', j' and c' (with apostrophes). Also, lots of words end with e and é. I e, é and consonant are the default word endings. For typical words: pas, c'est, ça, moi/toi/soi, quoi, puis, j'ai, ce, de.

Georgian looks like this: ჰერიო ბიჭებო, ჩიტი-გვრიტი (notice it's all round gibberish)

Thai looks like this: วังวน, โรคประจำตัว (notice the curls)

Sanskrit looks like this: जय राधा-माधव (notice the lines on top)

Vietnamese looks like this: Người ơi từ đây gặp nhau (notice the symbols on the vowels and the đ)

Hebrew looks like this: לכי, נוחי על מדרונות וגבעות (notice that everything has this shape ר and ת, and that they're like empty downwards).

In Italian, it's impossible for any word to end with a consonant, and also there are a lot of double consonants (cc ss zz tt ll gg etc.), the words "il" "e", "di", are very typical.

Turkish has ı, like "Fırtınanın sesinde" or "Bir ateşkes ilan etmek için kelime bulamad", notice that a lot ends with "ım" and "um", notice the consonants ş, ç and ğ, and the fact that there is a lot of ü (ex: Bütün dürüstlük ve kutsal kelimelerle),

Greek looks like this: για να έρθεις πάλι στην μεριά (very unique, notice it's kinda latin, kinda cyrillic; look for θ, β, χ, ς, and vowel combinations like ει, αι and ου; cyrillic doesn't have any of those or especially ν)

In Swedish, conjugated verbs in present end in "ar". Look for ö, ä, combinations with lj, y as vowel, and the very unique å. Notice that in Swedish, there is no ø (mostly in Norwegian and Danish).

In Finnish, everything looks double. Double vowels, double consonants, looots of ä (ex. Mä ammun laukauksia pimeään). Lots of dyphthongs like au, ai, ia. For example, the word "kaikki" would be typically Finnish, or "peipponen", or "tulevaisuuteen" (dypthong and doubles).

Modérateur à la retraite Alex the Translator
<a href="/fr/translator/alexander-laskavtsev" class="userpopupinfo" rel="user1248685">Alexander Laskavtsev <div class="author_icon" title="Page author" ></div></a>
Inscrit·e le : 06.06.2015

RataNegra has answered for all at once :lol:

Thank you!

Modérateur ɹoʇɐɹǝpoW
<a href="/fr/translator/besatnias" class="userpopupinfo" rel="user1120051">Besatnias <div class="moderator_icon" title="Модератор" ></div></a>
Inscrit·e le : 28.07.2012
Alexander Laskavtsev wrote:

RataNegra has answered for all at once :lol:

Thank you!

Lol, you're welcome. I take my pride in recognizing many languages. :D

Maître
<a href="/fr/translator/kiskakukk" class="userpopupinfo" rel="user1123814">kiskakukk </a>
Inscrit·e le : 06.09.2012

I try to introduce my native language, so the hungarian ;)

1) About the alphabet:

- Hungarian uses the Roman alphabet in addition to some diacritics placed over some vowels. The accent mark( s ) above the vowels indicate that the vowel is 'long'. Some consonants are digraphs, they consist of two letters; one consonant (dzs) is a trigraph. Although they are written with more than one letter, digraphs (and the trigraph) are each individual letters of the alphabet. (All consonants are pronounced – there are no silent letters.)

a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs

- If you see long words, for example: 'megszentségteleníthetetlenségeskedéseitekért' (this is the longest hungarian word) its surely hungarian language ;)

2) Few extra informations about the hungarian pronunciation:

- All consonants can be long or short. Long consonants are written as double consonants and are pronounced approximately twice as long as
short ones.

- Hungarian vowels are classified according to front vs. back assonance and rounded vs. unrounded. These terms come from describing the tongue position in the mouth and the roundedness of the lips, respectively. The following is the vowel inventory of Hungarian:

Back vowels: a, á, o, ó, u, ú
Front unrounded vowels: e, é, i, í
Front rounded vowels: ö, ő, ü, ű

Vowel harmony rules in Hungarian require that front or back assonance in the vowels of a stem be maintained throughout the entire word, thus for the most part – except for recent loan words – Hungarian words have either only back vowels in them or only front vowels.

Modérateur of the Moana
<a href="/fr/translator/silentrebel83" class="userpopupinfo" rel="user1082168">SilentRebel83 <div class="moderator_icon" title="Modérateur" ></div></a>
Inscrit·e le : 22.04.2011

ooh, what an interesting topic! And thank you for all the yummy knowledge everyone has shared so far.

"Alexander Laskavtsev" wrote:

Which unique features has your language. How can a foreigner identify your language among the similar ones?

With the exception of Tongan and Hawaiian, my knowledge of Oceanic languages are mildly limited.

Common Identifiers:
Some words used frequently around the world that derive from Oceania:
Taboo - the word was first noted by Captain Cook when he visited Tonga in 1777.
Aloha - the famed greeting when visitors arrive in the Hawaiian Islands.
Haka - the Māori war dance.
Tattoo - first referenced during Cook's voyage to Tahiti and New Zealand in 1769.
Bikini - the designer of the bikini named it after the Oceanic atoll of Bikini in 1947.

These 3 suffixes are commonly found in many Oceanic words:
-lani [sky/divine, chiefly or royal status],
-loa [long in terms of time and measurement; to double or magnify the prefix]
-hine [though Oceanic languages are non-gender specific, "hine" is a rare exception. Which, in this case, denotes the feminine].

Ka'iulani [Hawaiian for 'paradise']
Kalani [Hawaiian for 'heaven/sky']
'Aulani [Hawaiian for 'divine herald']
Kuoloa [Tongan for 'a long time passed']
Kōloa [Hawaiian for 'any sound that is prolonged']
Ta'ahine [Tongan for 'girl']
Wahine [Māori for 'woman'; note that Hawaiian is spelled the same way, but is pronounced 'vahine']
Kaikamahine [Hawaiian for 'girl']

Rongorongo:
It would of been nice if rongorongo was still in use today -- to be used as a writing system. Imagine using various hieroglyphs of fish, turtle, dolphin, whale, shark, palm tree, waves, sand, coconut, mango, papaya, bird... using all these to form words and meaning.

With that said, every culture in Oceania utilizes the Latin script.

Typology:
Linguistic typology in Oceanic languages follow the VSO [Verb-Subject-Object] order.
[Tongan] Na'e 'alu 'a Maa'imoa ki falekoloa --> went Maa'imoa to the grocery store.
[Hawaiian] Hele ia i ka halekūʻai --> [is] going he/she/it to the grocery store.

The Takuu language uses SOV.

Phonetics:
Tongan is the only language in Oceania that is exclusively phonetic [sounds true to each letter], which makes it easy to learn.

In Tokelauan, the "f"s are pronounced like an "h". So words like fiafia [joy] would sound more like hiahia.

In Māori, words with "wh" will make an "f" sound. So words like whakamautai [Lord] and whitia [to exchange/cross (over)] --> fakamautai and fitia.

In Hawaiian, the "w"s are pronounced like "v"s. So words like Hawai'i and wela [burning/hot] --> Havai'i and vela.

In Sāmoan, the "g"s are pronounced like "ng" from words like 'fishiNG' and 'siNGiNG'. So words like lagi [sky/heaven] and gagana [language] --> langi and ngangana.

Orthography:
Double vowels are common --> aa, ee, ii, oo, uu
[Tongan] Maamani [world/Earth/planet/existence/dimension/reality/terrarium]
ii [a fan]
iivi [physical strength/power]
aa [to awaken]

The glottal stop is fairly common in Oceanic languages, and is shown as an inverse apostrophe mark --> ʻ. As far as I'm aware, only the Māori language lacks this feature.
[Tongan] U'u [to bite]
Ta'ahine [girl]

Macrons are used for your average long vowels --> Ā ā Ē ē Ī ī Ō ō Ū ū
Sāmoa
Māori

Acutes for the more slightly elongated vowels --> aá eé ií oó uú
[Tongan] Poó ni [this night]

In Tongan, acutes and macrons are also used for distinguishing certain stresses that follow the definitive accent rule.
Te ne mohe ia hē [He/she/it will sleep [over] there] --> more specific about where the subject is sleeping at
Te ne móhe hena [He/she/it will sleep there] --> sleeping anywhere there

Graves are exclusively used in Tongan words of foreign origins --> À à È è Ì ì Ò ò Ù ù
Because Oceanic languages follow a strict rule where every syllable must end with a vowel, the grave accent was employed to accommodate words where no vowel is present in any particular syllable. Take for example the German word for Germany.
Deutschland -- neither syllables contain a vowel at the end. So to convert this into Tongan, we would have to take into consideration each and every rule of Tongan grammar...

1. Since there is no "D" in the Tongan alphabet, this will become a "T".
2. Since the first syllable [Deutch-] lacks a vowel at the end, we will apply one with a grave accent. Add in a little law of Tongan euphony, and we have "Toijì-" [Toichee] (Because of the grave accent, the "ee" is almost non-existent when enunciated.)
3. Again, the 2nd syllable [-land] lacks a vowel at the end. We then add another vowel --> lanì [lan-ee] (same reason as stated above.)
4. Put it all together and you have Toijìlanì.

This is all I can think of now.

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011
Habitué(e)
<a href="/fr/translator/thesecaat" class="userpopupinfo" rel="user1190589">thesecaat </a>
Inscrit·e le : 18.10.2013
kiskakukk wrote:

I try to introduce my native language, so the hungarian ;)

1) About the alphabet:

- Hungarian uses the Roman alphabet in addition to some diacritics placed over some vowels. The accent mark( s ) above the vowels indicate that the vowel is 'long'. Some consonants are digraphs, they consist of two letters; one consonant (dzs) is a trigraph. Although they are written with more than one letter, digraphs (and the trigraph) are each individual letters of the alphabet. (All consonants are pronounced – there are no silent letters.)

a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs

- If you see long words, for example: 'megszentségteleníthetetlenségeskedéseitekért' (this is the longest hungarian word) its surely hungarian language ;)

2) Few extra informations about the hungarian pronunciation:

- All consonants can be long or short. Long consonants are written as double consonants and are pronounced approximately twice as long as
short ones.

- Hungarian vowels are classified according to front vs. back assonance and rounded vs. unrounded. These terms come from describing the tongue position in the mouth and the roundedness of the lips, respectively. The following is the vowel inventory of Hungarian:

Back vowels: a, á, o, ó, u, ú
Front unrounded vowels: e, é, i, í
Front rounded vowels: ö, ő, ü, ű

Vowel harmony rules in Hungarian require that front or back assonance in the vowels of a stem be maintained throughout the entire word, thus for the most part – except for recent loan words – Hungarian words have either only back vowels in them or only front vowels.

First syllable recieves the stress in most words. For example, in that video, he starts to talk with saying "FENYvesi GAbi":
https://www.youtube.com/watch?v=k0PzIXlIwOk

And first word recieves the stress in most sentences.

There are so much "S" (Sh), "k", "g" as consontants. Vowels are mostly rounded, like o, ö, ó. Even "a" is as rounded as "o".

"R" and "Á" sounds are distinct. Especially "á" is pronounced long, like "eaaa" :)

And vowel harmony, of course.

As a foreigner, these are my opinions for identifying Hungarian.

Modérateur à la retraite
<a href="/fr/translator/aldefina" class="userpopupinfo" rel="user1152070">Aldefina </a>
Inscrit·e le : 16.01.2013

It's very easy to distinguish Polish. If you see letters: ą, ć, ę, ł, ń, ś, ż, ź, or ó. you can be sure it's Polish. Only the last letter may be confused with accent, like you have it in Spanish, but it completely changes the pronunciation, cos it's a different letter.

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011

ś can also be found in transliterated Sanskrit, Hindi and the like.

Modérateur à la retraite
<a href="/fr/translator/aldefina" class="userpopupinfo" rel="user1152070">Aldefina </a>
Inscrit·e le : 16.01.2013

Nevertheless, If you see some of these letters in one text then for sure it won't be Sanskrit, nor Hindi nor any other language.

And as for ß in German it seems to be a dying letter. When I read the first book that was written according to "new rules" (they are not new anymore, but how should I call them?) and I saw dass instead of daß I nearly fainted. After many years I still cannot accept it.

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011
Aldefina wrote:

And as for ß in German it seems to be a dying letter. When I read the first book that was written according to "new rules" (they are not new anymore, but how should I call them?) and I saw dass instead of daß I nearly fainted. After many years I still cannot accept it.

Nah, just the rules of when to use it were changed in 1996. Now it's used only after long vowels, and in "dass" you have a short vowel.

Éditeur à la retraite
<a href="/fr/translator/michealt" class="userpopupinfo" rel="user1222532">michealt </a>
Inscrit·e le : 11.10.2014

It's fairly easy to distinguish the Irish and Scottish gaelic languages.

In Irish, gc, bp, or dt at the beginning of a word are quite common.
In Scottish, none of gc, bp, and dt can occur at the beginning of a word.

In Irish letters á,é,í,ó and ú are all common but à, è, ì, ò and ù never occur. Verbal nouns ending in ú are quite common, while words ending adh are quite rare.
In Scotttish letters à, è, ì, ò and ù are all common. ú and í never occur. In traditional spelling, á occurs only in the proposition á/ás, but é and ó are common. Verbal nouns mostly end in "adh". In the new GOC (designed and defined by a committee far less competent that the one that famosuly tried to design a horse and ended up with a camel - this one would have achieved a legless camel with 27 humps) á ,é and ó are required to be written à (long stressed a), è (long open e) and ò (long open o) evenwhere there are pairs of words with different pronunciations and different meanings that must according to this silly rule be spellt identically.

If there are no accented vowels at all in a long text, either it's email or it isn't modern Irish. It could be modern Scottish, because over the last hundred years or so several individuals have chosen to publish books without accent marks on vowels (that strikes me as silly; but one of my favorite books - Cameron's collection of poems/songs written by Tiristich, "Na Baird Thirisdeach" - does that, Cameron claiming they aren't needed because people who speak the language know how to pronounce it even if it's written without accent marks - presumably he also thought they could resolve any resulting ambiguities). But from the mid 80s until the late 90s email generally couldn't cope with anything but 7-bit ascii, so either we didn't use accents in either language, or we used something like "\" before the vowel to indicate a grave accent and "/" after the vowel to indicate an acute accent, so maybe there are quite a few accent-free texts sculling around.

Modérateur 👨🏻‍🏫🇧🇷✍🏻👨🏻
<a href="/fr/translator/don-juan" class="userpopupinfo" rel="user1110108">Don Juan <div class="moderator_icon" title="Moderator" ></div></a>
Inscrit·e le : 05.04.2012

Maybe [@ulissescoroa] can help you.

Super membre
<a href="/fr/translator/bowien" class="userpopupinfo" rel="user1411645">Bowien </a>
Inscrit·e le : 06.02.2019

The Dutch language is hard to tell apart from the German language if you are unable to speak any of these languages.

Just as what was said before about the German language, the Dutch language doesn't have the ß.
We do have a lot of words in common with other countries.
Words as Café, überhaupt, troittoir and games are all correct in our language.
One way to find out easily if it's Dutch or not is to see if it has the word "Maar" in it. We use it a lot, (maybe a little too much!)
It is also possible to check because we have some weird vowel combinations such as "eu, au, ou, ei, ij, oe & ui"

I hope this will help a little (:

Modérateur à la retraite and Scholar of a Dark Age
<a href="/fr/translator/sciera" class="userpopupinfo" rel="user1077079">Sciera </a>
Inscrit·e le : 16.02.2011
Bowien wrote:

It is also possible to check because we have some weird vowel combinations such as "eu, au, ou, ei, ij, oe & ui"

eu, au and ei are also very common in German, but especially ij clearly shows that it's Dutch instead.

Éditeur Iranian Software Developer
<a href="/fr/translator/saeedgnu" class="userpopupinfo" rel="user1206882">saeedgnu <div class="editor_icon" title="Φροντιστής" ></div></a>
Inscrit·e le : 07.05.2014

I understand this is not Arabic or Persian forum, so I hope it's not off-topic.

Arabic script is used in Arabic language along with Persian, Urdu and many other languages.

Letters گ چ پ ژ are never used in Arabic.
So if they exist in text, it's most likely Persian or Urdu.
But also can be Kurdish, Azerbaijani, Punjabi, Balochi or Kashmiri.

Letters ٹ ڈ ڑ ے are used in Urdu, Punjabi, Balochi, Kashmiri.
But never used in Persian.

The letter ي has different forms with different Unicode letters and slightly different shapes depending on the position.
In Arabic: ي
In Persian: ی but in old versions of Windows (before 7) people typed the Arabic letter ي
In Urdu: ے

Also Arabic letter ك‎ has equivalent Persian letter ک (that was typed as Arabic form in Windows older than 7)

Letter ں seems to be only used in Urdu.

Kurdish is a family of different languages, all of them either in Latin script or Arabic script.
In case of Arabic script, it can contain these letters: ڕ ڤ‎ ێ ۆ ڵ‎ that seem to be specific to Kurdish.

Punjabi can be written in either Gurmukhi or Arabic script. (Gurmukhi is Left-to-Right, unlike Arabic)
In case of Arabic, it's called Shahmukhi alphabet, and it seems to be very similar to Urdu alphabet and I'm not sure how to distinguish them!

Pashto alphabet contain some unique letter like: ټ څ ڊ ډ ګ ڼ

Uyghur alphabet contains ڭ‎ ې ۈ ۇ that seem to be specific to Uyghur, and ۆ which is common with Kurdish.
It also uses the Arabic form of ك‎ (different than ڭ).

Pages