pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

PyPDF2.errors.PdfReadError: EOF marker not found

Open alisufian opened this issue 9 years ago • 2 comments

The following script originally hanged, but with PyPDF2==2.4.2 we get PdfReadError: EOF marker not found.

MCVE: PDF + Code

This file is 298MB with 21 pages.

from PyPDF2 import PdfReader

reader = PdfReader("01-006 2009-04-30 FRP NON CONFIDENTIAL PAP FILING.PDF")

Traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 267, in __init__
    self.read(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1218, in read
    raise PdfReadError("EOF marker not found")
PyPDF2.errors.PdfReadError: EOF marker not found

alisufian avatar Sep 01 '14 12:09 alisufian

PyPDF2==1.27.7 gives:

Traceback (most recent call last):
  File "/home/moose/foo.py", line 3, in <module>
    reader = PdfFileReader("01-006 2009-04-30 FRP NON CONFIDENTIAL PAP FILING.PDF")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 1208, in __init__
    self.read(stream)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 1828, in read
    line = self.readNextEndLine(stream, last1K)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 2078, in readNextEndLine
    raise PdfReadError("Could not read malformed PDF file")
PyPDF2.errors.PdfReadError: Could not read malformed PDF file

Also with strict=False

that is the part:

            # Prevent infinite loops in malformed PDFs
            if stream.tell() == 0 or stream.tell() == limit_offset:
                raise PdfReadError("Could not read malformed PDF file")

MartinThoma avatar Apr 19 '22 20:04 MartinThoma

the PDF is very odd... about 100 MB of null characters after removing the threshold to look for the end marker, it can be opened. also some lines in the cmap needs to be stripped....

pubpub-zz avatar Jul 24 '22 19:07 pubpub-zz

@MartinThoma, this issue should be release, isn't it ?

pubpub-zz avatar Aug 21 '22 19:08 pubpub-zz

Thank you for the reminder :-)

MartinThoma avatar Aug 21 '22 21:08 MartinThoma