pdfsizeopt merging of fonts with conflicting /Subrs

merging of fonts with conflicting /Subrs

Open zackw opened this issue 6 years ago • 1 comments

This document contains 22 copies of slightly different embedded subsets of the same font (XXXXXX+LinBiolinumO), but pdfsizeopt is not able to merge them because they have /Subrs tables. In #97, pts observed:

It looks like there is a low-hanging fruit here: two fonts with nonconflicting /Subrs can be easily merged. (Two /Subrs are conflicting if there is an index for which the two fonts have a different string.)

Unfortunately, for this document it doesn't look like it's going to be quite that simple. I extracted two (the first two) copies of the font, converted them from CFF to PFA, and then fully decoded them (both 'eexec' encapsulation and charstring encoding).

linbio1.pfa.txt linbio2.pfa.txt

The subsetting process seems to have discarded all /Subrs entries that are not referenced from a glyph, and then compacted the array, so for instance in linbio1.pfa.txt we have this glyph definition

/e {
        47 callsubr
        closepath
        48 callsubr
        closepath
        endchar
        } |-

but in linbio2.pfa.txt the same glyph is

/e {
        12 callsubr
        closepath
        13 callsubr
        closepath
        endchar
        } |-

And, naturally enough, this means the /Subrs array entries only line up for the special slots 0 through 4. So to merge them you would have to combine the /Subrs arrays from both fonts, remove duplicates, and then adjust the callsubr invocations in every glyph from both fonts. A simple matter of programming, but not completely trivial.

Sep 25 '18 11:09 zackw

Thank you for reporting this, and thank you for providing all the details!

I don't have time to implement this anytime soon, but I'll keep the issue open in case we get volunteers.

Please note that processing the charstrings to renumber callsubr invocations would be very slow in Python, because all charstrings and subrs string had to be parsed and then serialized back.

If we ever implement this, we could also easily implement deduplication of subrs, removal of unused subrs, and possibly (optionally) inlining of subtrs which are used only once.

Please also note that PostScript supports Type 1 and Type 2 charstrings, but in pdfsizeopt it's enough to support Type 2 only, because at the font merging stage charstrings have been already converted to Type 2.

Sep 25 '18 23:09 pts

pdfsizeopt pdfsizeopt copied to clipboard

merging of fonts with conflicting /Subrs

pdfsizeopt
pdfsizeopt copied to clipboard