pdfminer.six
pdfminer.six copied to clipboard
FontDescriptor without FontBBox generates lots of warnings
Bug report
I am parsing a PDF which triggers a lot of warnings
Could get FontBBox from font descriptor because None cannot be parsed as 4 floats
The warning was added in #1103
I've troubleshooted a bit.
The font specs is
{'Subtype': /'Type3', 'FirstChar': 0, 'Type': /'Font', 'FontDescriptor': <PDFObjRef:15>, 'CharProcs': {'g38B': <PDFObjRef:62>, 'g36B': <PDFObjRef:63>, 'g36A': <PDFObjRef:64>, 'g36C': <PDFObjRef:65>, 'g3A7': <PDFObjRef:66>, 'g3A6': <PDFObjRef:67>, 'g390': <PDFObjRef:68>, 'g0': <PDFObjRef:69>, 'g368': <PDFObjRef:70>, 'g367': <PDFObjRef:71>, 'g3AB': <PDFObjRef:72>, 'g369': <PDFObjRef:73>, 'g364': <PDFObjRef:74>, 'g363': <PDFObjRef:75>, 'g366': <PDFObjRef:76>, 'g365': <PDFObjRef:77>}, 'FontBBox': [60, 367, 1201, -1477], 'FontMatrix': [0.00048828125, 0, 0, -0.00048828125, 0, 0], 'Encoding': {'Type': /'Encoding', 'Differences': [0, /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g363', /'g364', /'g365', /'g366', /'g367', /'g368', /'g369', /'g36A', /'g36B', /'g36C', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g38B', /'g0', /'g0', /'g0', /'g0', /'g390', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g0', /'g3A6', /'g3A7', /'g0', /'g0', /'g0', /'g3AB']}, 'ToUnicode': <PDFObjRef:78>, 'LastChar': 174, 'Widths': [1293, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1290, 955, 1237, 1284, 1318, 1267, 1305, 1167, 1308, 1305, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 712, 0, 0, 0, 0, 966, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 628, 608, 0, 0, 0, 630], 'CIDToGIDMap': /'Identity'}
and the descriptor is
{'CapHeight': 1443, 'StemV': 109, 'Type': /'FontDescriptor', 'Flags': 4, 'XHeight': 1078, 'FontName': /'AAAAAA+SFUIText-Regular', 'ItalicAngle': 0}
Note that there's a FontBBox in the specs but none in the descriptor.
I've fixed it with modifying
https://github.com/pdfminer/pdfminer.six/blob/944fe73461a823f58f713c6c75099f40c0144472/pdfminer/pdffont.py#L1060-L1063
replacing
descriptor = dict_value(spec["FontDescriptor"])
with
descriptor = { "FontDescriptor": dict_value(spec["FontDescriptor"]), "FontBBox": spec["FontBBox"]}
But unsure if that's the proper fix.
Note that the warning should probably read "Could not get FontBBox [...]" in
https://github.com/pdfminer/pdfminer.six/blob/944fe73461a823f58f713c6c75099f40c0144472/pdfminer/pdffont.py#L968