pypdf
pypdf copied to clipboard
startxref not found
MCVE: Code + PDF
PDF Document: simple1.pdf
from PyPDF2 import PdfReader
reader = PdfReader("simple1.pdf")
Traceback
Traceback (most recent call last):
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1307, in _find_startxref_pos
startxref = int(line)
ValueError: invalid literal for int() with base 10: b'>>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 253, in __init__
self.read(stream)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1200, in read
startxref = self._find_startxref_pos(stream)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1311, in _find_startxref_pos
raise PdfReadError("startxref not found")
PyPDF2.errors.PdfReadError: startxref not found
I am currently having the same issue. I would highly appreciate some help on the topic.
I'm getting this same error with the linux program PDF-Shuffler. Is this caused by some package not being installed?
anyone have a solution for this?
Same in PyPDF2==1.26.0
.
Do you have code that leads to this issue?
This is not a bug, but a robustness issue. The PDF document is broken as it's missing an xref table.
@MartinThoma I propose to close this PR as the issue is due to a strongly damaged file
The Chrome PDF viewer can open it, so it's not impossible. But I get what you're saying: There are more important things to work on.
Please leave it open for the moment. I think I'll add this to the sample files. Once it's in there, we can close it.
I just opened an issue in that repo: This way I don't forget it and we can keep it clean here :-)
For people who run into issues with damaged PDF files: There are tools that can fix them: https://superuser.com/a/282056