alreq
alreq copied to clipboard
Why are bidi categories of Arabic-indic & Eastern Arabic numbers different?
Are all numbers equal in category and directional property?
- Digit 2 (U+0032) is of category "EN, European Number". OK.
- Arabic-digit indic ٢ (U+0662) is of category "AN, Arabic number". OK.
- but the other ۲ (U+06F2), the Eastern Arabic-Indic counterpart of it, is of category "EN, European Number" like digit 2. Any reason to this difference between the last two?
There is also a difference in Bidi behavior : the same visual text
will be displayed in RTL context as
if
is Arabic number, and
, if European number (simply like "a 2 b"). Aren't ALL numbers WEAK in directional property?
As far as I remember, the difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian. Details should be available from Unicode.
http://unicode.org/reports/tr9/#AN Section : 3.2 Bidirectional Character Types "[...]
- As of Unicode 4.0, the Bidirectional Character Types of a few Indic characters were altered so that the Bidirectional Algorithm preserves canonical equivalence. That is, two canonically equivalent strings will result in equivalent ordering after applying the algorithm."
I guess the "few Indic characters" are the Eastern Arabic-Indic digits in range U+06F0..U+06F9, which are classified "European Number" vs "Arabic numbers". I wonder what is the "canonical equivalence" problem in question. Didn't find more details.
I think it is referring to characters used for Indic languages, not the Arabic-Indic digits which AFAIK had this distinction from the start.
@shervinafshar and I had a discussion about this years ago here: https://groups.google.com/forum/#!topic/persian-computing/602gqTIrlPQ because I found Arabic-Indic Extended to suit better for our use on a special case (but maybe is better on other cases).
I remember @roozbehp (which I guess won't get pinged by my mentioning here), somewhere on a very old mailing list discussion, something like 2001(?), wrote he was explaining to a developer why these are different, so if my memory on this is correct, perhaps he would be a good person to ask about the reason of the difference.