afdko
afdko copied to clipboard
for glyphs named uniXXXX, is makeotf following the documented GOADB behavior?
According to the docs:
a) If the third field of the GOADB record for a glyph contains a Unicode value in the form uniXXXX or uXXXX[XX] (see note), assign that Unicode value to the glyph. Else b);
b) If a glyph name is in the Adobe Glyph List For New Fonts, use the assigned Unicode value. Else c);
c) If the glyph name is in the form uniXXXX or uXXXX[XX] (see note), assign the Unicode value. Else d);
d) Do not assign any Unicode value.
The use of “else” in the rules above suggests that if the conditions of that branch are met, then the behavior is triggered, and the heuristic descends no further.
In particular, if a glyph is named uniXXXX and has an explicit codepoint provided in the third column of its GOADB record, then the first branch of this rule should be triggered: the explicit codepoint should supersede the codepoint-inference rule in the third branch.
But this doesn’t seem to be what happens. Rather, if a glyph is named uniXXXX, it gets a codepoint of XXXX, even if the GOADB says otherwise.
Let’s have a test case! Attached is a PFB with one glyph called uni2032 and a GOADB asking that it receive PUA codepoint uE00D4:
uni2032 uni2032 uE00D4
(Don’t forget the blank line at the end! makeotf will crash without it!)
We expect that the glyph named uni2032 should get the codepoint uE00D4. I generate the font like so:
makeotf -f unicode-test.pfb -gf GOADB
By the way, the docs also say:
the –r or –ga options are NOT specified, the effect is to use the Unicode assignments from the third column of the GOADB without renaming the glyphs.
Yes! We want that codepoint from the third column.
But here’s what we see in the cmap of the generated OTF:
<?xml version="1.0" encoding="UTF-8"?>
<ttFont sfntVersion="OTTO" ttLibVersion="3.6">
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x2032" name="uni2032"/><!-- PRIME -->
</cmap_format_4>
<cmap_format_6 platformID="1" platEncID="0" language="0">
</cmap_format_6>
<cmap_format_4 platformID="3" platEncID="1" language="0">
<map code="0x2032" name="uni2032"/><!-- PRIME -->
</cmap_format_4>
</cmap>
</ttFont>
What we see is that our uE00D4 is nowhere to be found, and instead the glyph carries the never-asked-for u2032 codepoint.
(Don’t forget the blank line at the end! makeotf will crash without it!)
This has been fixed a while ago I think.
Edit: confirmed fixed, the trailing newline is no longer needed.
What would be the use case of overriding a deliberately named uXXXX glyph with a different code point?
I have a tool that folds suffixed glyphs into the default positions of a font source. So I might have uni2032 and uni2032.alt. I want to make the alt the default glyph by taking the codepoints that would be assigned ordinarily to uni2032 and giving them to uni2032.alt. This works for any glyph not named uniXXXX, because of this behavior in makeotf.
In any case, I think the codepoint-resolution rule is correct: the third column should always supersede anything else. Otherwise one of the explicit promises of the GOADB file is violated, namely that it should “Override the default Unicode encoding by MakeOTF”.
I did some testing and can confirm the bug. Summary: Unicode implied by the final glyph name (first column) currently cannot be overridden by the 3rd column.
Examples:
.notdef .notdef
uni0020 uni0020
uni0041 uni0041 uni0058
uni0042 uni0042 uni0059
uni0043 uni0043 uni005A
→ writes code points for A B C — X Y Z expected
.notdef .notdef
uni0020 uni0020
uni0041 A uni0058
uni0042 B uni0059
uni0043 C uni005A
→ writes code points for A B C — X Y Z expected
.notdef .notdef
uni0020 space
A A uni0058
B B uni0059
C C uni005A
→ writes code points for X Y Z
This problem is completely GlyphOrderAndAliasDB-based – the final glyph name does not show up anywhere else. While this behavior is confusing (and I agree it feels buggy), I suggest the following workarounds:
- instead of swapping out glyphs via Unicode override, swap the names in the left column
- do the Unicode swapping in the generated font binary via fontTools
instead of swapping out glyphs via Unicode override, swap the names in the left column
Meaning, I should rename uniXXXX glyphs to something else? Or am I misunderstanding.
do the Unicode swapping in the generated font binary via fontTools
For various reasons I can’t do that in this case, but yes I agree that would work.
If I understand correctly, you want A to be the default glyph in one project, while A.alt would be the default (read: triggered by code point U+0041) in another.
You can create GlyphOrderAndAliasDB files on a per-project basis:
GlyphOrderAndAliasDB_1
A A
B B
C C
A.alt A.alt
GlyphOrderAndAliasDB_2
A.alt A
B B
C C
A A.alt
You can use makeotf’s -gf mode to specify one or another GlyphOrderAndAliasDB file.
OK, I see what you mean. Yes, I am also doing substitutions like A for A.alt. But for those glyphs, I can override the codepoint in the third column, so there’s no need for a workaround. The codepoint override only fails for glyphs named uniXXXX.
For instance, suppose I have a test font with uni2032 and uni2032.alt and a GOADB like this:
uni2032 uni2032.alt
uni2032.alt uni2032
This is the resulting cmap:
<?xml version="1.0" encoding="UTF-8"?>
<ttFont sfntVersion="OTTO" ttLibVersion="3.6">
<cmap>
<tableVersion version="0"/>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x2032" name="uni2032"/><!-- PRIME -->
</cmap_format_4>
<cmap_format_6 platformID="1" platEncID="0" language="0">
</cmap_format_6>
<cmap_format_4 platformID="3" platEncID="1" language="0">
<map code="0x2032" name="uni2032"/><!-- PRIME -->
</cmap_format_4>
</cmap>
</ttFont>
I’m not sure what I expected to see here, but this cmap is the same as the one above.
I have a feeling that the cmap dumps are not even needed here. Since we are only moving within GlyphOrderAndAliasDB territory, this becomes a purely theoretical problem. We know that the code point given to a glyph is implied by the final name (left column), why not shuffle that name around? It seems odd to insist on a uniXXXX name to then override it.
Like this:
X A # converted to X
Y B # converted to Y
Z C # converted to Z
or this
uni0058 A # converted to X
uni0059 B # converted to Y
uni005A C # converted to Z
or this
uni0058 uni0041 # converted to X
uni0059 uni0042 # converted to Y
uni005A uni0043 # converted to Z
We know that the code point given to a glyph is implied by the final name (left column), why not shuffle that name around?
All my glyphs have a PUA codepoint (possibly in addition to one or more non-PUA codepoints). So yes, your suggestion would work, though it would preclude me from using those PUA codepoints (because the first-column name would completely determine the codepoint).
instead of swapping out glyphs via Unicode override, swap the names in the left column
AFAICT the problem with this workaround that OT feature code is tied to the existing glyph names, and this kind of glyph renaming would have unintended side effects.
OT feature code can be written using either “friendly” names (middle column) or final names (left column). The ability to give human-readable names to glyphs in OT feature context is a big reason for the GlyphOrderAndAliasDB to exist.
Test project attached. feature test.zip
Right. To my mind, if I’m making new names in the GOADB, and then I have to rename all the glyphs in the feature file, I might as well just rename the source glyphs in the first place to avoid this buggy uniXXXX name pattern.
This (or related) behavior has come up again for me in an all-caps font, where Unicode overrides are used for non-AGD glyph names:
The following GlyphOrderAndAliasDB snippet results in only uni0136 and uni013B code points (capital variants) – uni0137 and uni013C missing from the OTF. Typing the lowercase characters will result in a .notdef.
uni0136 Kcommaaccent uni0136,uni0137
uni013B Lcommaaccent uni013B,uni013C
This might border on an existential type question, but I would argue that the Unicode override (3rd column) should take precedence over whatever code point is applied in the first column. At the least, I would expect some kind of feedback – if this indeed not allowed, and not just a bug in makeotf.
I would like to discuss this further – any thoughts from the team?
The behavior observed above basically breaks docs rule a), as outlined at the very top of this issue.
I would like to discuss this further – any thoughts from the team?
Going back to rule a) referenced -- the current behavior does seem like a bug when weighed against that documentation. I don't have enough history on this to say whether the documentation has ever been correct (i.e. the tool used to work that way but has had some regression) or if the documentation was just wishful thinking. The documentation makes sense to me on general principle so maybe we should just treat it as being correct, and update the tool to do what it says.
I have encounter this issue and require immediate help. The Kangxi radicals are all compatibility characters to their Unified Ideographs counterpart (e.g. U+2F00 ⼀ is equivalent to U+4E00 一). Same goes to CJK Compatibility Ideographs too.
When making CJK fonts, it is usually good practice to map all the Kangxi radical codepoints to their Unified Ideographs for correct locale display and save some precious GIDs(as does Source Han), but the behaviour in this issue is blocking the GOADB from working correctly. (this issue did not affect Source Han fonts as they were CID-keyed fonts — the cmap file/-ch option is used instead) There is no workaround that can be used for this instance, the codepoints must be merged to one glyph.
Test case:
uni4E00 uni4E00 uni2F00,uni4E00
uni4E28 uni4E28 uni2F01,uni4E28
uni4E36 uni4E36 uni2F02,uni4E36
uni4E3F uni4E3F uni2F03,uni4E3F
uni4E59 uni4E59 uni2F04,uni4E59
uni4E85 uni4E85 uni2F05,uni4E85
uni4E8C uni4E8C uni2F06,uni4E8C
uni4EA0 uni4EA0 uni2F07,uni4EA0
uni4EBA uni4EBA uni2F08,uni4EBA
uni513F uni513F uni2F09,uni513F
uni5165 uni5165 uni2F0A,uni5165
uni516B uni516B uni2F0B,uni516B
uni5182 uni5182 uni2F0C,uni5182
uni5196 uni5196 uni2F0D,uni5196
uni51AB uni51AB uni2F0E,uni51AB
uni51E0 uni51E0 uni2F0F,uni51E0
@NightFurySL2001 Yes, that's really bad. I checked and saw where it's assigning a single value when the glyph name is already a name like uni4E00. I fixed in it in https://github.com/adobe-type-tools/afdko/tree/zqs-goadb-fix and tested and it's working for me. We'll check and add tests before getting this into the release, but if you want to try that branch it should be good for this.
Thank you @punchcutter, the branch fix resolved the issue. I hope this get pushed to release soon.