python-pdfbox issues

update python-pdfbox to support PDFBox 3.*

5

The command-line interface to PDFBox [was changed in version 3.*](https://pdfbox.apache.org/download.html).

lebedov

enhancement

text extraction hangs on MacOS 10.14

11

I am trying to use `pdfbox`, with this vanilla snippet: ``` converter = pdfbox.PDFBox() converter.extract_text( input_path=str(pdf.absolute()), output_path=str(txt.absolute())) ``` But it becomes stuck. I debugged the stack tree, and it hangs...

devcsrj

Extracting order pre-definable?

3

Hi Guys, Just wondering for a pdf file, if the text extraction order can be defined? As pointed out [here](https://pdfbox.apache.org/2.0/faq.html#textorder), is there similar setting to adjust the extracting order? This...

luke4u

Bounding box text coordinates

2

Any Ideas on how to extract the text with its corresponding bounding boxes? Saw some people extending the `PDFTextStripper` class, but JPype can't handle it.

victor-ab

Windows PDFBox.PDFBox() fails at Urllib error

3

When I merely import pdbox, and initiate the PDFBox() function, it immediately throws an error message as following. Please help > urllib.error.URLError:

Rammurthy5

extract_text goes on forever.

3

I installed latest PDFBox on my Mac via pip. I did an import and called on to the extract_text() method. And it keeps running perpetually for a 196 KB file....

Rammurthy5

python-pdfbox
python-pdfbox copied to clipboard

Metadata

update python-pdfbox to support PDFBox 3.*

text extraction hangs on MacOS 10.14

Extracting order pre-definable?

Bounding box text coordinates

Windows PDFBox.PDFBox() fails at Urllib error

extract_text goes on forever.

it just hang on win10

start_page and end_page not working

Use JPype to call into jars directly

Initializing python package pdfbox throws me out of python

← Metadata

Owner

Metadata

python-pdfbox python-pdfbox copied to clipboard

Metadata

← Metadata

Owner

Metadata

python-pdfbox
python-pdfbox copied to clipboard