pypdf startxref not found

startxref not found

Open mayankmetha opened this issue 5 years ago • 6 comments

MCVE: Code + PDF

PDF Document: simple1.pdf

from PyPDF2 import PdfReader
reader = PdfReader("simple1.pdf")

Traceback

Traceback (most recent call last):
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1307, in _find_startxref_pos
    startxref = int(line)
ValueError: invalid literal for int() with base 10: b'>>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 253, in __init__
    self.read(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1200, in read
    startxref = self._find_startxref_pos(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1311, in _find_startxref_pos
    raise PdfReadError("startxref not found")
PyPDF2.errors.PdfReadError: startxref not found

Feb 14 '19 16:02 mayankmetha

I am currently having the same issue. I would highly appreciate some help on the topic.

May 07 '19 14:05 gabrossmann

I'm getting this same error with the linux program PDF-Shuffler. Is this caused by some package not being installed?

May 26 '19 19:05 kirkins

anyone have a solution for this?

Nov 09 '20 22:11 webcontrols

Same in PyPDF2==1.26.0.

Jun 25 '21 11:06 channprj

Do you have code that leads to this issue?

Apr 07 '22 16:04 MartinThoma

This is not a bug, but a robustness issue. The PDF document is broken as it's missing an xref table.

Jun 26 '22 11:06 MartinThoma

@MartinThoma I propose to close this PR as the issue is due to a strongly damaged file

Feb 09 '23 05:02 pubpub-zz

The Chrome PDF viewer can open it, so it's not impossible. But I get what you're saying: There are more important things to work on.

Please leave it open for the moment. I think I'll add this to the sample files. Once it's in there, we can close it.

Feb 10 '23 07:02 MartinThoma

I just opened an issue in that repo: This way I don't forget it and we can keep it clean here :-)

For people who run into issues with damaged PDF files: There are tools that can fix them: https://superuser.com/a/282056

Feb 10 '23 07:02 MartinThoma

pypdf pypdf copied to clipboard

startxref not found

MCVE: Code + PDF

Traceback

pypdf
pypdf copied to clipboard