pdfplumber icon indicating copy to clipboard operation
pdfplumber copied to clipboard

Doesn't work for rotated page

Open Tobeabellwether opened this issue 2 years ago • 2 comments

Describe the bug

A clear and concise description of what the bug is. When I use page.extract_text() to extract text from a 90 degree rotated page, the results is just some garbled words

Code to reproduce the problem

Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

  • pdfplumber version: [e.g., 0.5.22]
  • Python version: [e.g., 3.8.1]
  • OS: [e.g., Mac, Linux, etc.]

Additional context

Add any other context/notes about the problem here.

Tobeabellwether avatar Mar 29 '23 11:03 Tobeabellwether

Thanks for flagging this @Tobeabellwether. That makes sense, given the approach pdfplumber takes to extracting text. I think adding support for rotated pages would be a good addition to the library.

jsvine avatar Mar 29 '23 13:03 jsvine

I have a similar issue where some parts of the text is 90 degrees rotated (in a portrait page):

image

Copy-pasting the text manually works fine, but the .extract_text() method returns it in reversed order and badly segmented:

OHW
A door-to-door polio vaccination
©
campaign in Yemen :otohP

I'll find a workaround but agree this would be a great new feature for this library !

OrianeN avatar Apr 05 '23 09:04 OrianeN