Provide support for all cmap table formats
E.g. platformID = 1, encodingID = 0 as used in http://www.ivank.net/BRUSHSTP.ttf.
I'd somewhat advocate not bothering with this - the format is so old nothing makes these fonts anymore (the format 0 cmap is horrendendously inadequate for anything but toy fonts =). Adding support for more complex or new formats like 13/14 would be worth doing, but format 0 would add support for something we shouldn't even be using anymore.
Looks like Apple just decided to use platformID = 0 for their default system font, see #139
cmap 12 read support was just added with PR #207 😉
Any other important formats we should support?
@fdb 4 is limited to 16 bit (Unicode Plan 1) & 12 to 32 bit (All Unicode Plans) they follow the same specification & it looks like they're the most common cmap tables.
I decompiled some fonts with FontTools & found that format 6 is also common. So maybe the next step will be reading format 6 but if nobody is having a problem now, maybe we can wait before implementing it 😉
For proper opentype support, I'd consider cmap 4, 12, 13 and 14 essential: cmap 4 and 12 for "proper plain old unicode" support—4 mapping to UCS2, and 12 mapping to UCS4—and the (recently introduced) cmap 13 and 14 because opentype needs them for properly supporting many-to-one mapping, and variation selection mapping, respectively.
Although that said, many of the other formats are almost trivial to implement compared to subtables 4 and 12, so... I'd honestly just say "implement them all". If effort is already going into proper cmap handling, handling all of them is good target.
@Pomax Nice to know! I think that CMAP 12 writing is the most important right now but one day maybe we will support every format ;)
But before that we will need to change how the cmap tables are handled, because right now if the cmap 12 is found the cmap 4 is not read (this is not a problem as 12 is a superset including 4) but we can't do that if we're adding more formats.
By the way are the 13 & 14 well implemented now?
They're getting to.
I'm not sure why you'd skip 4 if 12 is found, though, but then I've not read the code in quite a while; keeping the UCS-2 and UCS-4 sets separate is generally a good idea, sometimes even with a cmap 0 for the 256 ANSI block, so the cmap parsing procedure is that you check which cmap subtables are available, then run through each of those to find your character index. The "does this character have an index according to this subtable" is a generally fast procedure, so you might "waste some time" looking in tables, but it will be negligible compared to the time necessary to render the glyph outline.
Also note that cmap 13 uses the exact same data structures and information coding as 12, except that the "start glyph" for a character range as used in 12 is simply considered "the only glyph" in 13, so if you have an implementation for 12 already, adding support for 13 (barring needing a rewrite on how characters are mapped through multiple cmap subtables of course) is virtually no extra work.
@Pomax The cmap 12 support was recently added by @Vildan & I think it was just easier to skip 4 if 12 was found. If not, it will need a rewrite. For now, it's easier & performance-wise faster, but not future-proof!
Thanks for the details though! Personally, I'm already busy with a lot of other things so feel free to contribute if you need to 😉
skipping 4 when 12 is found is a great way to not find characters that are definitely in the font, so filing an issue to make sure all sub tables are checked will be a good idea =)
as for contributing: I run an insane amount of projects already, so writing comments or just talking about how the opentype spec wants things done is a quick and easy job I am happy to do; reviewing code for whether an approach is sound is a bit more work, but typically still doable with a few 15 minutes here or there, but writing code is way more work than I have free time for at the moment =)
Hey @Pomax thanks for clearing that up. It sounds it'll be a good idea to keep all of them and do a lookup through them. Do you know if the spec says something about the order in which they should be looked up?
Because there are only format 4 and 12 now, and 12 is superset of 4, there is no need to read format 4 if a font has format 12 in it. And because cmap tables placed in ascending order, we can find format 12 before format 4. @Pomax, do you have an example when we skip characters if read only format 12? I ran this test on 4000+ fonts and didn't find a single font where format 4 gives some extra characters versus format 12
Rereading the spec, you're right; it quite literally says "Please note, that the content of format 12 subtable, needs to be a super set of the content in the format 4 subtable. The format 4 subtable needs to be in the cmap table to enable backward compatibility needs.". I'm curious if the OpenType spec revisions will remove this need for a cmap_4 in the future, but it does indeed fully justify not bothering with reading the subtable 4 format if format 12 is present.
Here’s some test cases for cmap subtables; see README for how to run the test suite.
We create a font subset online DEMO that compares some of the differences between opentype.js and fonttools subsets, may be helpful.
http://fonter.dancf.com/examples/subset/
Technically by supporting format 12, you get format 13 for free right?
I have a TON of PDFs that use 14. Just throwing my vote in for this- I have no idea what it's all about :-)
We are meanwhile supporting format 14 (via #581) as well as format/encoding 0 for platform 1 (via #634), which the issue was originally about. The provided example BRUSHSTP.ttf will load fine with the current master.
If anyone could provide a font using format 13, that would be great.
If anyone could provide a font using format 13, that would be great.
Format 13 will be supported via #647, which will close this issue. As discussed before, it's not worth the time to support obscure formats that will probably never be encountered in the wild. Anyone providing a real font with an unsupported format is still welcome to open a new issue for that, of course!