pdfalto icon indicating copy to clipboard operation
pdfalto copied to clipboard

PDF to XML ALTO file converter

Results 85 pdfalto issues
Sort by recently updated
recently updated
newest added

Normally diagonal text is useless for grobid training.

See [The structure of one of the page area (PageSpace) elements](https://www.loc.gov/standards/alto/techcenter/layout.html#:~:text=The%20structure%20of%20one%20of%20the%20page%20area%20(PageSpace)%20elements) Required by #140 ?

An error case for accent composition in pdfalto, see https://github.com/kermitt2/grobid/issues/906 for the pdf.

bug

I have cloned the repository, successfully compiled the pdfalto tool as instructed in the readme and processed a pdf file to get a few files as output, including an xml...

implemented

Hi @kermitt2, In the Description tag the MeasurementUnit is said to be "pixel", but the height, weight & coordinates have values like these "HPOS="75.1181" VPOS="69.4485" HEIGHT="8.5140" WIDTH="274.993", which don't really...

$ make ... [ 15%] Linking CXX executable pdfalto /usr/bin/ld: libs/image/png/linux/libpng.a(png.c.o): relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIC .... $ cc...

bug

We're currently testing pdfalto. Specifically, we're converting a lot of PDFs to HTML via the XML output of pdfalto (as we were not quite satisfied with the result of any...

Changed xpdf repository URL from ssh to https to [avoid (unnecessary) public key validation which causes issues e.g. in docker containers](https://stackoverflow.com/a/16465182/5743296).

We're encountering an issue where the annotations file has no DEST(ination) elements nested within the "goto" ACTION elements for many of the PDFs we're converting. As far as we know...

When i run pdfAlto in Windows with a file including a space **and** Umlaut (ä,ü,ö), i am getting a i/O Error: Couldn't open file This happens only in combination of...

bug
windows-specific