Lookup flags in fonts with no GDEF table
Someone reports an issue in LibreOffice about some Iranian fonts like B Nazanin (fonts attached in the LIbreOffice issue).
Basically the font has mark ligatures for shadda and the lookups has ignore base and ignore ligatures set (which makes no sense), which causes HarfBuzz to correctly skip bases and make ligature between shadda and fathatan in strings like:
$ hb-view "B Nazanin.TTF" اتّفاقاً
Oddly enough, both Uniscribe and Core Text ignore the lookup flags and don’t for the ligatures.
$ hb-view "B Nazanin.TTF" اتّفاقاً --shaper=coretext

As it turns out, the font has no GDEF table, so HarfBuzz synthesizes one and use it here.
It is a rather odd font, and I HarfBuzz synthesis of a GDEF table actually matches what the spec is saying (emphasis mine):
In addition, a client uses class definitions to apply GSUB and GPOS LookupFlag data correctly. For example, a LookupFlag may specify ignoring ligatures and marks during a glyph operation. If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
But it might be an interoperability issue that we might want to consider (my 2¢ is to 🤷🏽 and declare it working as designed).
Sounds related to the GDEF change we made and reverted. The carrying-forward of the flags is wrong... Anyway. I'll check later. This was just me speculating based on the title.
That is with 2.7.1 where the offending commit is reverted, unless you mean the old behavior was also wrong.
Hmm, if I add a GDEF table with glyph classes, HarfBuzz stops making the ligatures: BNazanin#2.zip
(Core Text and Uniscribe don’t make the ligatures when GDEF table is present as well, so my initial guess was wrong)
unless you mean the old behavior was also wrong.
Yes.
Simon found problems with pre-2.7.0 HB-no-GDEF. We tried to address, broke others.
Even then there's more issues. I'll try to write them down here soon.
Let's revive this issue and figure out what needs to be fixed.
Fixed by https://github.com/harfbuzz/harfbuzz/pull/3365 ?
Does not seem so.
Can you send me the font in question please?
Can you send me the font in question please?
Nevermind. Got it.
I think I know what to do.
Basically, what they are doing, I suppose, is that when matching ligatures (and context, etc) other than the first item, ignore IgnoreBase and IgnoreLigature... lol. We do the same during mark-attachment lookups.
If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
Do we actually have evidence that Uniscribe does this?
I'm leaning towards closing this.
I have no more information to add and since apparently it is only this font and IMHO that is a badly built one, lets close.
This problem is not limited to B Nazanin. I have the same problem with Arial and Tahoma:
$ hb-view /usr/share/fonts/truetype/msttcorefonts/Arial.ttf اتّفاقاً
## PROBLEMATIC OUTPUT HERE ##
$ hb-view --version
hb-view (HarfBuzz) 2.7.4
Available shapers: graphite2,ot,fallback
I am using Ubuntu 22.04.
The same problem is visible with Arial and the latest version of harfbuzz:
$ ./util/hb-view /usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf اتّفاقاً
## PROBLEMATIC OUTPUT HERE ##
$ ./util/hb-view --version
hb-view (HarfBuzz) 4.4.1
Available shapers: ot,fallback
This is not the same issue. The old corefonts are broken, I get the same failure in with DirectWrite as well. Newer versions of the fonts are fine.
To be clear, the fonts have ligature lookups for the mark ligatures that set both ignore base and ignore ligatures lookup flags, so HarfBuzz is doing what the fonts asked for.
So the difference is that Arial has a GDEF table, but B Nazanin does not. So again I think the issue is that HarfBuzz is synthesizing glyph classes when GDEF table is absent and DirectWrite is apparently not doing that.
Re-opening to see if we actually want to follow MS implementation here or not.
If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
Do we actually have evidence that Uniscribe does this?
Apparently the answer is no, it does not do this.
With:
diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 0806abb7d..29c2ec2f9 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -137,8 +137,8 @@ hb_ot_shape_planner_t::compile (hb_ot_shape_plan_t &plan,
* Decide who provides glyph classes. GDEF or Unicode.
*/
- if (!hb_ot_layout_has_glyph_classes (face))
- plan.fallback_glyph_classes = true;
+ //if (!hb_ot_layout_has_glyph_classes (face))
+ // plan.fallback_glyph_classes = true;
/*
* Decide who does substitutions. GSUB, morx, or fallback.
The B* fonts stop making the incorrect mark ligature.
If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
Do we actually have evidence that Uniscribe does this?
Apparently the answer is no, it does not do this.
I opened https://github.com/MicrosoftDocs/typography-issues/issues/954 for spec clarification.
If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
Do we actually have evidence that Uniscribe does this?
Apparently the answer is no, it does not do this.
From the Uniscribe API limitations also I have deduced that they can't be doing this.
If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.
Do we actually have evidence that Uniscribe does this?
Apparently the answer is no, it does not do this.
From the Uniscribe API limitations also I have deduced that they can't be doing this.
At least not for positioning.
I'm thinking about blacklisting the GPOS of these fonts.
Where can we find them all?
This problem is not limited to B Nazanin. I have the same problem with Arial and Tahoma:
FWIW we intentionally nuke the GDEF of these fonts, up to Windows 8. Wonder if we should do the same to their GPOS.
This is one way to handle this: Only synthesize glyph classes if font doesn't have GDEF and doesn't have GPOS.
diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 3da317a5c..5c68bee21 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -137,7 +137,8 @@ hb_ot_shape_planner_t::compile (hb_ot_shape_plan_t &plan,
* Decide who provides glyph classes. GDEF or Unicode.
*/
- if (!hb_ot_layout_has_glyph_classes (face))
+ if (!hb_ot_layout_has_glyph_classes (face) &&
+ !hb_ot_layout_has_positioning (face))
plan.fallback_glyph_classes = true;
/*
One test fails though. Need to investigate.
I haven't checked that it actually fixes the case at hand.
One test fails though. Need to investigate.
One of the failing tests is in Thai and we fail to zero mark advances, I suppose because there is no mark attachment in the GPOS. This change seems fragile at this point in time :(.