pdf-inspector Numerals read as `\u0000` when using font feature settings

Numerals read as `\u0000` when using font feature settings

Open SimonEggert opened this issue 1 year ago • 0 comments

First of all, thanks for the work and effort you've put into this great library!

Bug description

We are having an issue with numerals not being read correctly by PDF::Inspector::Text.analyze. They get misinterpreted as \u0000 when we use font-feature-settings: 'tnum' as style. We are generating the PDF with Gotenberg from HTML templates.

Minimal reproducible example

<div>21.09.2023</div> gets read as 21.09.2023

while

<div style="font-feature-settings: 'tnum'">21.09.2023</div>gets read as \u0000\u0000.\u0000\u0000.\u0000\u0000\u0000\u0000.

PDFs

Here are two PDFs, one with the feature turned off and one with the feature turned on: font_features_off.pdf font_features_on.pdf

Further information

The UNIX tool pdftotext is able to read both versions correctly so I think the PDF is alright. The font in use is Barlow if that makes any difference.

Any help would be appreciated!

P.S.: I'll also open an issue regarding this problem over at https://github.com/yob/pdf-reader so feel free to close this one if you think it should be handled there.

Sep 21 '23 07:09 SimonEggert

pdf-inspector pdf-inspector copied to clipboard

Numerals read as `\u0000` when using font feature settings

Bug description

Minimal reproducible example

PDFs

Further information

pdf-inspector
pdf-inspector copied to clipboard