bytebuster: (Farang-baa)
[personal profile] bytebuster
Ну скажіть, авжеж я умнічка? :-)

Q: What is the Mongolian vowel separator for?

I've heard of the Mongolian vowel separator from programmers, who regard it as an interesting quirk in Unicode. When I google for it, most of the hits are from those revelling in its geekiness.

But as someone dabbling a bit in Mongolian, I'd like to know: What is it used for?


A:

The formal description has been already given in the excellent @ColinFine's answer. Let me give a different description in "layman terms".

Mongolian characters usually have four distinct forms: isolate, initial, medial, and final.
Vowels A and E have exactly the same glyphs in their final form.
Here are the four forms for A and E, correspondingly.
Note, both have two versions of final glyphs:

Mongolian vowels A and E

Although choosing between A or E can be concluded from the the syntax (A for masculine grammatic gender while E is for feminine), there can be semantic difference depending on the final form (stroke up or stroke down).

For example:

qara [qara] (to look), stroke up;
qar+a [qar+a] (black), stroke down;

The Vowel Separator is used in the second word.
Phonetically, there's a little pause before the final vowel.
Note that A does not obtain the isolate form. Instead, it only changes to the second final form. Also, R gets the final form in the second word.


As per why the character makes programmers' hell (just in case if you wonder).

Most of the modern-day compilers allows Unicode identifiers (e.g., variable names). You may write your program with variables in your (non-English) language, and your program works just fine.

However, using U+180E may lead you into a trouble because it may or may not be considered a symbol. Here's what happens (assume that X is the Mongolian Vowel Separator):

integer variable aXa = 42;
print aa;

Note: since the X symbol is invisible, the first line on your screen looks like:

integer variable aa = 42;

Trouble one: Unicode versions prior to 4.0 treat X as a formatting (thus, valid) character, but trying to use the variable aa leads you to an error because there is no such variable! There's only aXa, but you can't see it.

Trouble two: Unicode 4.0 treats X as a zero-width space. This means that you're trying to declare a variable with space in its name (a a) which makes your code unable to compile. But again, due to invisibility you simply don't know what's wrong. The code visually looks perfectly valid.


References:

...

Дата: П'ятниця, 9 Червень 2017 15:03 (UTC)
sirozhagladkov: (Default)
Від: [personal profile] sirozhagladkov
Тіх, хто іспользує non-Ascii в качістві ідентіфікаторов, надо награждать кругосвєтним путешествієм. Пішком. Прічом начінать с пешего перехода по дну Індійського океана.

...

Дата: П'ятниця, 9 Червень 2017 15:21 (UTC)
sirozhagladkov: (Default)
Від: [personal profile] sirozhagladkov
Ні-каг-да.

Друге дєло, шо там був такий англійський, шо мериканці очінь долго обіжались - всьо таки ніхорошо так іздівацця над язиком Шекспіра і Елвіса.

Впрочім, ето било давно.

...

Дата: П'ятниця, 9 Червень 2017 18:43 (UTC)
jonathan_simba: (Default)
Від: [personal profile] jonathan_simba
Чингісхан схвалює. Хоча він і був неписьмменний...

...

Дата: П'ятниця, 9 Червень 2017 19:00 (UTC)
jonathan_simba: (Default)
Від: [personal profile] jonathan_simba
Цікаво, дякую. Дід Свирид також про це згадує в другому томі "Історії...".

...

Дата: П'ятниця, 9 Червень 2017 19:53 (UTC)
jonathan_simba: (Default)
Від: [personal profile] jonathan_simba
Дуже рекомендую! Воно того варте, читається легко і без напрягу. Ще краще, ніж перший том!
Сторінку створено Понеділок, 14 Липень 2025 14:07

Травень 2025

П В С Ч П С Н
   1 234
567891011
12131415161718
192021222324 25
262728293031 
Створено з Dreamwidth Studios

За стиль дякувати