pdfalto icon indicating copy to clipboard operation
pdfalto copied to clipboard

PDF to XML ALTO file converter

Results 85 pdfalto issues
Sort by recently updated
recently updated
newest added

The text is the following: And when the grobid-quantities (in this case) extracts the `5` of the `n=5`, in reality it calculates the bounding boxes including `n=-` as well. When...

question

This should allow the usage of a plain `xpdf` mirror, without the need of touching the `CMakeLists.txt` file. https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory ```cmake add_subdirectory(source_dir [binary_dir] [EXCLUDE_FROM_ALL]) ``` - Use automatic xpdf mirror repo...

Currently the following styles seem to be supported by pdfalto: - italic - bold - subscript - superscript Some other styles we would be interested in: - underline - sc...

enhancement

I used **Clang 6.0 and AddressSanitizer** to build **[pdfalto](https://github.com/kermitt2/pdfalto)**, this [file](https://github.com/grandnew/software-vulnerabilities/blob/master/pdfalto/infinite_loop) can infinite loop when executing this command: ```shell ./pdfalto infinite_loop 1.xml ```

bug

Previously, we used pdfalto to generate an ALTO XML from the pdf and https://github.com/filak/hOCR-to-ALTO to convert the ALTO XML to hOCR file after that. With the newest release of pdfalto...

implemented

The option -cutPages does not seem to be working: without that option, all textlines are extracted ``` pdfalto -blocks input.pdf ``` but with the option enabled.. ``` pdfalto -blocks -cutPages...

enhancement

Hi, I found a crash in (the latest commit `8296a3d` on master). PoC: https://github.com/strongcourage/PoCs/blob/master/pdfalto_8296a3d/PoC_segv_TextPage::createPath Command: pdfalto $PoC /dev/null ASAN says: ~~~ ==17560==ERROR: AddressSanitizer: SEGV on unknown address 0x02007f614fef (pc 0x00000073e2e2...

bug
implemented

Hi, I found a UAF bug in (the latest commit `8296a3d` on master). PoC: https://github.com/strongcourage/PoCs/blob/master/pdfalto_8296a3d/PoC_uaf_TextPage::createPath Command: pdfalto $PoC /dev/null ASAN says: ~~~ ==12326==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000036418 at pc...

bug
implemented

Compiling with gnu 5.4.0, got this warning: [ 15%] Building CXX object xpdf-4.00/xpdf/CMakeFiles/xpdf.dir/GlobalParams.cc.o In file included from /home/roger/higgins-crowley/pdfalto/pdfalto/xpdf-4.00/xpdf/GlobalParams.cc:64:0: /home/roger/higgins-crowley/pdfalto/pdfalto/xpdf-4.00/xpdf/UnicodeToUnicodeFontRules.h:29:1: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]...

wontfix

When I use pdfalto to parse Chinese PDF articles, some articles cannot be parsed, and an error "Syntax Error (305883): No font in show" is reported. Command : ./pdfalto pdf_file...

implemented