PyPDF4 icon indicating copy to clipboard operation
PyPDF4 copied to clipboard

PdfFileReader crashes on a normal PDF file

Open askerlee opened this issue 5 years ago • 5 comments

The PDF file causing error is attached. This one-page file is extracted from a PDF using Acrobat. 1.pdf

When it's opened with PdfFileReader and calls numPages, the script crashes with an exception:

pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (8, 0)

But if I decompress this file first using qpdf, there's no error. Seems some sort of rare structure in the low-level.

askerlee avatar Sep 19 '19 02:09 askerlee

Met another file with the same error 😢 Seems not so rare. test.pdf

askerlee avatar Sep 23 '19 06:09 askerlee

Update: these files can be opened normally with pypdf2. Seems it's a bug introduced in pypdf4.

askerlee avatar Sep 23 '19 07:09 askerlee

This bug is still valid ? cause I did merge those files with another

cadu-leite avatar Jul 27 '20 22:07 cadu-leite

Any updates on this? I am also having troubles reading the following file:

101880043.pdf

jucajuca avatar Mar 07 '21 18:03 jucajuca

In [1]: from PyPDF4 import PdfFileReader

In [2]: test_pdf = PdfFileReader(open('test.pdf', 'rb'))

In [3]: One_pdf = PdfFileReader(open('1.pdf', 'rb'))

In [4]: one_pdf = PdfFileReader(open('1.pdf', 'rb'))

In [5]: numbers = PdfFileReader(open('101880043.pdf', 'rb'))
---------------------------------------------------------------------------
PdfReadError                              Traceback (most recent call last)
<ipython-input-5-dfaa3ab77f43> in <module>
----> 1 numbers = PdfFileReader(open('101880043.pdf', 'rb'))

~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in __init__(self, stream, strict, warndest, overwriteWarnings)
   1146             stream = BytesIO(b_(fileobj.read()))
   1147             fileobj.close()
-> 1148         self.read(stream)
   1149         self.stream = stream
   1150

~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in read(self, stream)
   1964                     continue
   1965                 # no xref table found at specified location
-> 1966                 raise utils.PdfReadError("Could not find xref table at specified location")
   1967         #if not zero-indexed, verify that the table is correct; change it if necessary
   1968         if self.xrefIndex and not self.strict:

PdfReadError: Could not find xref table at specified location

In [6]: test_pdf.getNumPages()
Out[6]: 1

In [7]: one_pdf.getNumPages()
Out[7]: 1

A got an error using @jucajuca PDF sample, but the others seems to work .

Using ...

  • Python 3.9
  • PyPDF4==1.27.0

You may post some code samples ...

cadu-leite avatar Mar 08 '21 11:03 cadu-leite