pdfplumber icon indicating copy to clipboard operation
pdfplumber copied to clipboard

When I set repair=true,there is an error:'utf-8' codec can't decode byte 0xae in position 239: invalid start byte.Because of the original PDF?

Open zyc1128 opened this issue 8 months ago • 1 comments

Describe the bug

A clear and concise description of what the bug is.

And When I use pages.page.char[x]["text"] to get contens by single char,some texts from tables have been lost.I also find there is no bytes_like of the key of image object,how can I save images in the PDF to local?

Have you tried repairing the PDF?

Please try running your code with pdfplumber.open(..., repair=True) before submitting a bug report.

Code to reproduce the problem

Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

  • pdfplumber version: [e.g., 0.5.22]
  • Python version: [e.g., 3.8.1]
  • OS: [e.g., Mac, Linux, etc.]

Additional context

Add any other context/notes about the problem here.

zyc1128 avatar May 30 '24 02:05 zyc1128