alreq icon indicating copy to clipboard operation
alreq copied to clipboard

Why are bidi categories of Arabic-indic & Eastern Arabic numbers different?

Open ntounsi opened this issue 8 years ago • 4 comments

Are all numbers equal in category and directional property?

  • Digit 2 (U+0032) is of category "EN, European Number". OK.
  • Arabic-digit indic ٢ (U+0662) is of category "AN, Arabic number". OK.
  • but the other ۲ (U+06F2), the Eastern Arabic-Indic counterpart of it, is of category "EN, European Number" like digit 2. Any reason to this difference between the last two?

There is also a difference in Bidi behavior : the same visual text a2b will be displayed in RTL context as b2a if two is Arabic number, and a2b, if European number (simply like "a 2 b"). Aren't ALL numbers WEAK in directional property?

ntounsi avatar Oct 25 '16 16:10 ntounsi

As far as I remember, the difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian. Details should be available from Unicode.

duerst avatar Oct 25 '16 23:10 duerst

http://unicode.org/reports/tr9/#AN Section : 3.2 Bidirectional Character Types "[...]

  • As of Unicode 4.0, the Bidirectional Character Types of a few Indic characters were altered so that the Bidirectional Algorithm preserves canonical equivalence. That is, two canonically equivalent strings will result in equivalent ordering after applying the algorithm."

I guess the "few Indic characters" are the Eastern Arabic-Indic digits in range U+06F0..U+06F9, which are classified "European Number" vs "Arabic numbers". I wonder what is the "canonical equivalence" problem in question. Didn't find more details.

ntounsi avatar Oct 26 '16 17:10 ntounsi

I think it is referring to characters used for Indic languages, not the Arabic-Indic digits which AFAIK had this distinction from the start.

khaledhosny avatar Oct 26 '16 20:10 khaledhosny

@shervinafshar and I had a discussion about this years ago here: https://groups.google.com/forum/#!topic/persian-computing/602gqTIrlPQ because I found Arabic-Indic Extended to suit better for our use on a special case (but maybe is better on other cases).

I remember @roozbehp (which I guess won't get pinged by my mentioning here), somewhere on a very old mailing list discussion, something like 2001(?), wrote he was explaining to a developer why these are different, so if my memory on this is correct, perhaps he would be a good person to ask about the reason of the difference.

ebraminio avatar Mar 07 '17 17:03 ebraminio