pdf-issues icon indicating copy to clipboard operation
pdf-issues copied to clipboard

CapHeight required only for fonts with Latin characters is ambiguous

Open bdoubrov opened this issue 2 years ago • 5 comments

PDF 2.0 specifies CapHeight entry in FontDescriptor dictionary (Table 120) to be "Required for fonts that have Latin characters, except for Type 3 fonts".

There are two issues here: a font subset may include only lowercase Latin characters, and this parameter would not have sense for it. Second, the term "Latin charater" is not defined in the spec. And I'm not sure if it would be correct to define it only as an ASCII character in the range [a-zA-Z].

As a potential resolution, the text might be modified so say: "Required for fonts that have Latin characters in the range A-Z, except for Type 3 fonts".

bdoubrov avatar Apr 13 '23 12:04 bdoubrov

Out of curiosity: is this parameter still used/useful in practice? If yes, what is it used for?

Depending on the answer to that question, we could also consider deprecating it or making it optional if we're messing with the requirement scope anyway. If, on the other hand, we're not able to come up with a clear answer to the above, I'm not sure it's a good idea to try and rewrite this requirement.

EDIT: There may be other font metrics for which this could be a meaningful exercise.

MatthiasValvekens avatar Apr 13 '23 18:04 MatthiasValvekens

Not unsurprisingly most font descriptor metrics have a direct relationship to values inside font programs: https://learn.microsoft.com/en-us/typography/opentype/spec/os2#scapheight

For non-embedded fonts that don't use uppercase Latin chars, it may still make sense to still specify as font matching or synthesis algorithms may still want to use it. See https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6bsln.html, "Example: Format 1 Baseline Table".

Also "Latin" extends beyond just A-Z - consider Æ.

I agree that the required statement as written is vague as to what "Latin" means - could we refer to Annex D.2 "Latin character set and encodings" to improve preciseness? (This has more chars than is strictly needed but I think would make things validatable)

petervwyatt avatar Apr 14 '23 00:04 petervwyatt

Latin == "ISO Latin 1" or Annex D.2 "Latin character set and encodings"???

@lrosenthol - note below table D.2 mentions Adobe Latin / Mac OS Latin. Please research...

petervwyatt avatar Oct 16 '23 19:10 petervwyatt

I have always considered it to mean ISO Latin 1, but will investigate.

And as @petervwyatt mentioned, CapHeight is used in various implementations for things like font matching, etc. So just because you don't have a Cap in your subset, you still need the value (as read from the original font file)

lrosenthol avatar Feb 06 '24 12:02 lrosenthol

Just to note that CapHeight may well also be used in fonts for Cyrillic characters, and maybe some other scripts.

car222222 avatar Feb 06 '24 13:02 car222222