ufo-spec
ufo-spec copied to clipboard
[UFO4] support cmap Unicode Variation Sequences
See various comments in #77.
In particular https://github.com/unified-font-object/ufo-spec/issues/77#issuecomment-452633570:
The UVS data can be represented by a sequence of (unicodeValue, variationSelector, glyphName) tuples, where glyphName is optional. No glyph name means: this is the default variation, and the cmap should be used to find the glyph name for this code point.
I see two ways of storing the UVS data:
- As a nested structure, a dict at the top level, mapping
variationSelector
keys to dicts, that mapunicodeValue
keys toglyphName
strings. - A two-dimensional table of rows with three fields each.
Option 1 can be stored in plist format, with the caveat that we need to convert unicode value keys to (hex) strings, as plist dict keys must be strings. The nested data structure closely resembles the internal structure of the OpenType format 14 cmap subtable.
Option 2. could be stored as a tab-separated text file, with the caveat that care has to be taken to respect the "no restrictions in glyph names" UFO policy. The lines in the file represent the Variation Sequences quite literally: 0030 FE00 zero.slash
.
Option 1 is more machine-friendly, option 2 is more human-friendly.
Option 1 with just one sequence:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>FE00</key>
<dict>
<key>0030</key>
<string>zero.slash</string>
</dict>
</dict>
</plist>
Option 2 with just one sequence:
0030 FE00 zero.slash
Storing UVS could be combined with the "regular" character mapping, by using an optional third column for the variation selector:
0030 zero
0030 zero.slash FE00
Or maybe we should consider using (a dialect of) csv:
0030;zero;
0030;zero.slash;FE00
@khaledhosny do you have any opinions on which option that @justvanrossum proposed would be better to work with (and any unseen gottchas that may be missed in them)?
I don’t have a deep knowledge of the matter, so whatever works with the tools that consume this is fine for me.
From twitter:
In order to deal with default vs non-default UVSes, which is important for IVSes, I suggest something along the lines of the following (excerpt from the Adobe-Japan1 IVD collection):
8FBB E0100;cid3056
8FBB E0101;cid8267
Which UVS is default depends on which glyph is encoded.
JIS90-savvy Japanese fonts encode CID+3056 from U+8FBB 辻, meaning <8FBB E0100> 辻󠄀 is the default UVS. JIS2004-savvy ones encode CID+8267 from U+8FBB 辻, meaning <8FBB E0101> 辻󠄁 is the default UVS. The other, of course, is non-default, and requires a UVS to display properly.
And, to be clear, both UVSes should be specified so long as the font includes both glyphs, and both UVSes should be present and accounted for in the Format 14 'cmap' subtable.
Which UVS is default needs to be determined at compile time, because interaction with the Format 12 subtable is required to ascertain which glyph that corresponds to a UVS is encoded, and therefore the default one.
If you are looking for an extreme test case, check out the latest version of “IVS Test,” which I deployed a little over a year ago, and whose Format 14 'cmap' subtable includes nearly 40 million UVSes: https://github.com/adobe-fonts/ivs-test
Which, by my reading, means that the spec needs to state that the tool making the font needs to decide which UVS is the default, all the designer can do is to specify the UVS for the cmap.
I'm leaning towards option 2, as it seems the easiest for editing this data (yes, spreadsheets)