alreq icon indicating copy to clipboard operation
alreq copied to clipboard

U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' not always rendered correctly

Open Manishearth opened this issue 7 years ago • 13 comments

U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' (ﷲ) is supposed to render as alef-lam-lam-meem (with diacritics), but in some fonts, including Courier New, the Alef is missing.

http://www.fileformat.info/info/unicode/char/fdf2/fontsupport.htm

The code point could conceivably mean "the main l-l-m ligature in 'allah'", however the spec decomposes it as a-l-l-h, so all fonts should render the leading alef.

Manishearth avatar Jun 13 '17 23:06 Manishearth

Screenshot on my system, with buggy fonts marked highlighted red:

screen shot 2017-06-13 at 5 07 38 pm screen shot 2017-06-13 at 5 08 27 pm

Creating these kinds of ligatures, specially RIAL and ALLAH are very common in fonts.

The bug here seams to be the font assigning U+FDF2 to a ligature glyph for the second joining segment of the word ALLAH (which is LLAH), instead of creating a composed glyph for U+FDF2 using the ligature.

CLDR data, which is our primary source for character support, misses any kind of information about ligatures (and their possible codepoints). Seeing this bug being common, specially in the more open-source fonts, I think we can cover the topic in ALReq and, even, maybe, provide an Annex with some details about the important ligatures and their implementation details in fonts (like the detail here that the ligature doesn't get U+FDF2 codepoint, but U+FDF2 uses the ligature.)

What do you think?

behnam avatar Jun 13 '17 23:06 behnam

Since U+FDF2 is a presentation form character, I think we shouldn’t say much more than discouraging the use of presentation forms in text input. As for the fonts, though they indeed break the glyph for U+FDF2, the ligatures for الله and لله still work correctly.

khaledhosny avatar Jun 13 '17 23:06 khaledhosny

Right, @khaledhosny. True that we want to discourage them in text. So, the question is, do we want to cover the issue for the sake of improving font development processes and font products for the script?

Since the topic is not exactly text layout, I think it could be a separate (wiki) document, or maybe an annex on font development.

behnam avatar Jun 13 '17 23:06 behnam

I agree this does not belong to the main document, an annex on Arabic font development best practices might be a good idea.

khaledhosny avatar Jun 13 '17 23:06 khaledhosny

My thinking is :

  • Do not use U+FDF2 the presentation form character. Beside being deprecated, many fonts omit the first ALEPH.
  • Write ALLAH in full letters (ALEF LAM LAM HEH). Many fonts try, if not to replace it by the ligature shape, but to decorate it by adding the "formal" ARABIC SHADDA ّ (U+0651) and ARABIC LETTER SUPERSCRIPT ALEF ٰ (U+0670). Note however, to not put proper diacritics like SHADDA and FATHA after LAM. They might come over the added formal signs.

Html code to test your fonts: <p>&nbsp;&#xFDF2; &#x627;&#x644;&#x644;&#x64E;&#x651;&#x647; &#x627;&#x644;&#x644;&#x647; </p>

@behnam and @khaled, +1 to cover font development best practices.

ntounsi avatar Jun 24 '17 12:06 ntounsi

The Unicode Standard 11.0.0 says the following in section 9.2 Arabic Presentation Forms-A: U+FB50–U+FDFF, Word Ligatures (this was added in Unicode 7.0.0):

U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM is a very common ligature, used to diplay the name of God. When the formation of the allah ligature is desired, the recommended way to represent the word would be <alef, lam, lam, shadda, superscript alef, heh> <0627, 0644, 0644, 0651, 0670, 0647>. In non-Arabic languages, other forms of heh, such as heh goal (U+06C1), may also form the ligature. Extra care should be taken not to form the ligature in the absence of the shadda and the superscript alef, as the sequence <alef, lam, lam, heh> and <alef, lam, lam, shadda, heh> exist in Persian and other languages with different meanings or pronunciations, where the formation of the ligature would be incorrect and inappropirate.

moyogo avatar Jun 20 '18 09:06 moyogo

I decided it was time for me to explore this a little more deeply. Here are some other results. I created a test page at: https://w3c.github.io/alreq/gap-analysis/tests/ligation/ligation_000.html

Here are some results i screen-captured on my Mac. Grey backgrounds from a v quick scan indicate things i think are probably incorrect.

screen shot 2018-06-21 at 17 55 31

Essentially, this whole thing is quite broken, it seems. (Which is surprising given the content involved.)

r12a avatar Jun 21 '18 17:06 r12a

Arial overcompensating by adding a double shadda/alif is very surprising (and somewhat hilarious) to me given how commonly that font is used.

Then again, I guess very little about non-latin text not working on computers should surprise me anymore 😩

Manishearth avatar Jun 21 '18 17:06 Manishearth

My perception is that, contrary to what Unicode suggests, Arabic users expect bare [alef] lam lam heh to ligate and that is what almost all Arabic fonts do. Arabic non-God name words that would match the same sequence of letters are very uncommon to the extent that I never encountered any of them until I was researching this very issue. In Amiri I approached this from the other end; actively matching sequences that are unlikely to be the name of God and unligating them, e.g. خالله does not ligate, but فالله ligates while فالَله does not.

khaledhosny avatar Jun 21 '18 19:06 khaledhosny

When I discussed this issue with @roozbehp he had some examples of Persian words that do this, IIRC.

Just to lay it out, there are multiple issues here, of varying severity:

  • U+FDF2 sometimes rendered as l-l-h for some fonts, which is completely wrong
  • a-l-l-h autoligaturifies to add shadda/dagger alif, which is incorrect if it is part of some words (but as Khaled says this may be what people expect)
  • a-l-l-h with diacritics gets more diacritics added to it in Arial, which is again completely wrong
  • Arial, Tahoma, Al Bayan, Damascus add diacritics to l-l-h when there is no alif, which seems similarly incorrect to me (PR adding them to this file)

Manishearth avatar Jun 21 '18 19:06 Manishearth

As @r12a notes in https://r12a.github.io/scripts/arabic/block#charFDF2 the compatibility decomposition for FDF2 is <alif, lam, lam, heh> (“≈ [isolated] 0627 0644 0644 0647”).

While the (non normative) reference glyph is a ligature <alif, lam, lam, shadda, superscript alif, heh>, this hasn’t always been the case. In the Appendix H. New Characters of the Unicode Standard 1.1, the reference glyph used is a ligature <alif, lam, lam, heh> without shadda nor superscript alif. This may explain where the compatibility decomposition of FDF2 comes from. capture d ecran 2018-06-22 a 10 18 25

moyogo avatar Jun 22 '18 09:06 moyogo

The production process changed between Unicode 2.x and 3.0. From that point on, different custom software was used with an entirely new collection of TrueType fonts. With many upgrades, both to the software and the font collection, that process is still very much in place today.

Every update of the font collection bears the risk of unintentional changes, and not all of them are caught be reviewers. Therefore, it would take some digging to find out whether the change from a glyph matching the decomposition to a glyph adding shadda and alif was indeed intentional at the time.

asmusf avatar Jun 23 '18 03:06 asmusf

I was curious to see if any fonts have FDF2 as alif, lam, lam, heh without shadda and superscript alif.

I managed to find a handful:

There are most probably more.

Including these, there are also more typefaces that do not ligate <lam, lam, heh> (regardless of what FDF2 they have). Some of these do have an optional discretionary ligature feature that does the ligature.

There may also be fonts that do FDF2 with shadda but no s. alif like https://www.linotype.com/1079191/hasan-alquds-unicode-regular-product.html?site=webfonts&format=ot-ttf&branding=std or there may also be fonts that do FDF2 with shadda and fatha like https://fonts.google.com/specimen/Harmattan.

moyogo avatar Jun 24 '18 05:06 moyogo