python-pdfbox
python-pdfbox copied to clipboard
extract_text goes on forever.
I installed latest PDFBox on my Mac via pip.
I did an import and called on to the extract_text(
>>> import pdfbox as p, os
>>> os.path.exists(f). # f is the file path
True
>>> pp = p.PDFBox()
>>> pp.extract_text(f)
extract_text(f) doesn't end, runs perpetually.
What version of Python, Java, and MacOS are you running? Can you attach the file you are trying to process? As noted in #14, I haven't been able to reproduce the problem.
macOS: 10.15.6 Python: 3.7.1 Java: 1.8.0_202 pdf copy.pdf File attached.
I didn't encounter any errors with the file you posted using the package versions in #14. Can you try using OpenJDK 14 rather than Oracle's Java?