pdfalto icon indicating copy to clipboard operation
pdfalto copied to clipboard

PDF to XML ALTO file converter

Results 85 pdfalto issues
Sort by recently updated
recently updated
newest added

![Image Pasted at 2019-3-26 11-34](https://user-images.githubusercontent.com/9571357/54984395-131c3e80-4faf-11e9-889a-8cd6a6f34cb7.png) And the text `appears to be maintained until approximately 35 40 years -of age, followed by modest decreases until 50 years of age`,, you can...

bug

Is there a centos repo somewhere were I can download a prebuilt binary of pdfalto?

help wanted

In _XmlAltoOutputDev.cc_ a lot of small strings are allocated with the size of 10/20 or 50 characters. It seems that in some circumstances the buffer overflow occurs (typically when outputting...

bug

Hi, in the readme section, you said that the reordering has been extended to whole text flow rather than the first page. However, it seems not working on my case....

Some requests coming coming about having the possibility to output characters along with their respective attributs (width, height, fonts..)

enhancement

I used **Clang 6.0 and AddressSanitizer** to build **[pdfalto](https://github.com/kermitt2/pdfalto)**, this [file](https://github.com/grandnew/software-vulnerabilities/blob/master/pdfalto/detected_memory_leaks) can cause memory leaks when executing this command: ```shell ./pdfalto detected_memory_leaks 1.xml ``` This is the ASAN information: ```...

bug

I used **Clang 6.0 and AddressSanitizer** to build **[pdfalto](https://github.com/kermitt2/pdfalto)**, this [file](https://github.com/grandnew/software-vulnerabilities/blob/master/pdfalto/FPE_ImageStream) can cause FPE in function ImageStream::ImageStream in Stream.cc when executing this command: ```shell ./pdfalto FPE_ImageStream 1.xml ``` This is...

bug

Content mine regroups a list of some known problematic fonts and maps character to correct unicode (e.g : l -> λ)

enhancement

This is a suggestion from the user @dlaurie linebreaks except when they would be significant (pdftohtml -xml did that), elision of unnecessary attributes, i.e. rotation=0, angle=0.

suggestion