Christian Clason
Christian Clason
How would I do that? Do you have a running instance somewhere? (Otherwise it'd just be simpler for you to download the file in the top comment and check if...
Ah, I found the pre-built binary. Yes, that seems to extract the numbers correctly from the linked file (where `pdfalto` HEAD fails). (Curiously, `pdfalto` works fine for a PDF --...
That doesn't do anything for me? (Weird that calling the bundled `pdfalto` on the same file then produces an XML with correct reference numbers, where locally built `pdfalto` fails.)
> I don't understand. You might need to select TEI -> Process Fulltext Document and upload the PDF file: Thank you. I don't know the Grobid stack; I'm just working...
This seems only to apply to older PDFs from arXiv such as the one in the first comments; current (late 2022+) PDFs seem to work fine.
Seems only to be `8`, from what I can tell (which breaks detection of years and pages etc.) (Ligatures are annoying but don't break the number format, so less problematic.)
> Yes, but not everywhere. The DOI, for example is correct. Yes, because that's a different font. It's only oldstyle figures in Linux Libertine (pre-2022). > Most of them should...
Does this help? https://tug.ctan.org/fonts/libertine/latex/LinLibertine_R.tex (The font name for math should be `nxlmi` and `ntxsy*`.)
The `8` is the big issue, since it breaks metadata detection. The rest is just nice to have (with the ligatures being the most bang for the buck).
Oooh, `taboldstyle`. That sounds like a font error...