python-pdfbox icon indicating copy to clipboard operation
python-pdfbox copied to clipboard

extract_text goes on forever.

Open Rammurthy5 opened this issue 3 years ago • 3 comments

I installed latest PDFBox on my Mac via pip. I did an import and called on to the extract_text() method. And it keeps running perpetually for a 196 KB file. Please help.

>>> import pdfbox as p, os
>>> os.path.exists(f).  # f is the file path
True
>>> pp = p.PDFBox()
>>> pp.extract_text(f)


extract_text(f) doesn't end, runs perpetually.

Rammurthy5 avatar Aug 04 '20 06:08 Rammurthy5

What version of Python, Java, and MacOS are you running? Can you attach the file you are trying to process? As noted in #14, I haven't been able to reproduce the problem.

lebedov avatar Aug 05 '20 02:08 lebedov

macOS: 10.15.6 Python: 3.7.1 Java: 1.8.0_202 pdf copy.pdf File attached.

Rammurthy5 avatar Aug 05 '20 04:08 Rammurthy5

I didn't encounter any errors with the file you posted using the package versions in #14. Can you try using OpenJDK 14 rather than Oracle's Java?

lebedov avatar Aug 06 '20 18:08 lebedov