pdfplumber
pdfplumber copied to clipboard
When I set repair=true,there is an error:'utf-8' codec can't decode byte 0xae in position 239: invalid start byte.Because of the original PDF?
Describe the bug
A clear and concise description of what the bug is.
And When I use pages.page.char[x]["text"] to get contens by single char,some texts from tables have been lost.I also find there is no bytes_like of the key of image object,how can I save images in the PDF to local?
Have you tried repairing the PDF?
Please try running your code with pdfplumber.open(..., repair=True)
before submitting a bug report.
Code to reproduce the problem
Paste it here, or attach a Python file.
PDF file
Please attach any PDFs necessary to reproduce the problem.
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.
Expected behavior
What did you expect the result should have been?
Actual behavior
What actually happened, instead?
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
- pdfplumber version: [e.g., 0.5.22]
- Python version: [e.g., 3.8.1]
- OS: [e.g., Mac, Linux, etc.]
Additional context
Add any other context/notes about the problem here.