pdfalto
pdfalto copied to clipboard
PDF to XML ALTO file converter
![Image Pasted at 2019-3-26 11-34](https://user-images.githubusercontent.com/9571357/54984395-131c3e80-4faf-11e9-889a-8cd6a6f34cb7.png) And the text `appears to be maintained until approximately 35 40 years -of age, followed by modest decreases until 50 years of age`,, you can...
Is there a centos repo somewhere were I can download a prebuilt binary of pdfalto?
In _XmlAltoOutputDev.cc_ a lot of small strings are allocated with the size of 10/20 or 50 characters. It seems that in some circumstances the buffer overflow occurs (typically when outputting...
Hi, in the readme section, you said that the reordering has been extended to whole text flow rather than the first page. However, it seems not working on my case....
Some requests coming coming about having the possibility to output characters along with their respective attributs (width, height, fonts..)
I used **Clang 6.0 and AddressSanitizer** to build **[pdfalto](https://github.com/kermitt2/pdfalto)**, this [file](https://github.com/grandnew/software-vulnerabilities/blob/master/pdfalto/detected_memory_leaks) can cause memory leaks when executing this command: ```shell ./pdfalto detected_memory_leaks 1.xml ``` This is the ASAN information: ```...
I used **Clang 6.0 and AddressSanitizer** to build **[pdfalto](https://github.com/kermitt2/pdfalto)**, this [file](https://github.com/grandnew/software-vulnerabilities/blob/master/pdfalto/FPE_ImageStream) can cause FPE in function ImageStream::ImageStream in Stream.cc when executing this command: ```shell ./pdfalto FPE_ImageStream 1.xml ``` This is...
Content mine regroups a list of some known problematic fonts and maps character to correct unicode (e.g : l -> λ)
This is a suggestion from the user @dlaurie linebreaks except when they would be significant (pdftohtml -xml did that), elision of unnecessary attributes, i.e. rotation=0, angle=0.