Lev E. Givon

Results 76 comments of Lev E. Givon

Since a major design goals of python-pdfbox is enabling users to quickly access pdfbox features regardless of their jar management preferences, I don't wish to remove the automated download feature....

Does the `sort` option of the `extract_text` method do what you need? If not, you will have to look into wrapping pdfbox's dev API (by design, python-pdfbox only exposes pdfbox's...

@zevio, if you delete the pdfbox-app*jar file cached by python-pdfbox (in `~/.cache/python-pdfbox` on Linux or `~/Library/Caches/python-pdfbox` on MacOS), the latest jar file will be downloaded the next time you import...

What version of Python, Java, and MacOS are you running? Can you attach the file you are trying to process? As noted in #14, I haven't been able to reproduce...

I didn't encounter any errors with the file you posted using the package versions in #14. Can you try using OpenJDK 14 rather than Oracle's Java?

Does this occur with all PDFs, or only with some? If the latter, can you attach it to this issue?

I can't reproduce the hanging problem with the input PDF file you mentioned on Ubuntu Linux 18.0.4 with Python 3.7.3 and OpenJDK 11.0.4. I suspect some sort of platform-specific jpype...

I finally obtained access to a MacOS box. I can't reproduce the problem with Python 3.8.5, OpenJDK 14.0.2, and python-pdfbox 0.1.8 on MacOS 10.15.6; processing the indicated [file](http://www.senate.gov.ph/lisdata/2497921482!.pdf) succeeds without...

@peterHeuz Given that more than person has encountered the issue on MacOS, I added a note to the package README.

Can you provide more information? What version of python-pdfbox, Python, etc. are you using? How are you invoking python-pdfbox?