alreq
alreq copied to clipboard
Review of the characters table and the changes needed for CLDR
Review of the characters table and the changes needed for CLDR. (Other details may be needed for this review)
Letters U+0671 ARABIC LETTER ALEF WASLA is marked X not used. Should be marked as auxiliary for Arabic. May be check-marked for Persian
Diacritics U+0670 ARABIC LETTER SUPERSCRIPT ALEF, is used auxiliary for some Koran publications. Should be marked * instead of X
Some punctuation and symbols U+0020 SPACE U+002A ASTERISK * U+002F SOLIDUS / U+003C LESS-THAN < U+003D EQUALS = U+003E GREATER-THAN > are marked X not used for Arabic. Should be check-marked.
Control characters Some of control characters (at least those related to Bidi), [U+202A..U+202E] and [U+2066..U+2069], ZWJ & ZNWJ, are not language related. Think they should be marked as auxiliary. Assuming that they are not intended for normal, but special use.
See Also: https://r12a.github.io/scripts/arabic/block#char0671
@ntounsi, @khaledhosny, can you provide more details about the use of U+0670 and U+0671 in Arabic language, hopefully standard and local variants?
Re U+0671: https://en.wikipedia.org/wiki/Dagger_alif
SUPERSCRIPT ALEF is one of the characters in "الله" ligature, making it a common one on modern usage.
That is U+0670.
Re U+0671 can be used in any Arabic word starting with alif wasl, usually U+0627 is used, but some publication will use U+0671 for various reasons. Of course the most common use of it is in Quran.
https://en.wikipedia.org/wiki/Wasla https://en.wiktionary.org/wiki/%D9%B1
Also I think this quote from https://r12a.github.io/scripts/arabic/block#char0671 should be taken with a grain of salt, for one modern standard Arabic do use case endings (there is nothing “old” about them), it is just that some people do avoid them in fear of getting the rules wrong, but that is in no way a standard or generally celebrated practice:
The joining hamza is of little practical importance in modern arabic pronounced without the old case endings.
Here are some results from google books search (lots of other garpage results, though, indexing Arabic PDFs is a lost cause 😞):
https://books.google.com.eg/books?id=97l1BwAAQBAJ&pg=PA60&dq=%D9%B1&hl=en&sa=X&ved=0ahUKEwi73_iq85zTAhWDOBQKHaHECHU4FBDoAQhbMAg#v=onepage&q=%D9%B1&f=false https://books.google.com.eg/books?id=WpByAgAAQBAJ&pg=PA162&dq=%D9%B1&hl=en&sa=X&ved=0ahUKEwiPkauQ85zTAhVEPhQKHf6iBHEQ6AEIRDAG#v=onepage&q=%D9%B1&f=false
Since gbook links are not always reliable, here's a snapshot from the second link (I couldn't see the first one):
https://books.google.com.eg/books?id=WpByAgAAQBAJ&pg=PA162&dq=%D9%B1&hl=en&sa=X&ved=0ahUKEwiPkauQ85zTAhVEPhQKHf6iBHEQ6AEIRDAG#v=onepage&q=%D9%B1&f=false

BTW, the Wasla sign
looks like the letter Sad ص followed by Heh final form ﻪ in some old style.
The resulting word is "Sah", meaning "Shut up!", used to demand silence.
A Koran reader who reachs this kind of sign (above any letter), must pause.
The resulting word is "Sah", meaning "Shut up!", used to demand silence : «don't pronounce this Alef».
Interesting!
Btw, let's keep the issue open until we file CLDR tickets and they are resolved.
Comments from conf-call:
- Arabic Question Mark is not present in any of the tables.
- ASCII Question Mark should NOT be marked as "used" for either languages.
- Whether we should have an ASCII table, or keep it as "Punctuations and Symbols"
@mostafah is going to work on fixing the script to address some of the problems.
@ntounsi, @khaledhosny, would you use any of these two chars in Modern Arabic, besides their Composed form with ALEF?
- U+0653 ARABIC MADDAH ABOVE
- U+0655 ARABIC HAMZA BELOW
CLDR ticket filed: http://unicode.org/cldr/trac/ticket/10221
@mostafah, assigning this to you to work on the script.
Also, would you please take a look why these two get marked as main/used in Persian, which I believe shouldn't be marked at all?
- U+2060 WORD JOINER
- U+FEFF ZERO WIDTH NO-BREAK SPACE
I don’t think U+0653 or U+0655 are likely to be used since they are almost always combined with alef and precomposed characters for them exist. However, several characters have canonical decomposition involving both, so NFD text will have them.
@behnam Sure. Thanks for the CLDR ticket.
Created a new issue regarding Section A.5 Control characters: https://github.com/w3c/alreq/issues/127
See this comment on #128.