PyPDF4
PyPDF4 copied to clipboard
PdfFileReader crashes on a normal PDF file
The PDF file causing error is attached. This one-page file is extracted from a PDF using Acrobat. 1.pdf
When it's opened with PdfFileReader
and calls numPages
, the script crashes with an exception:
pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (8, 0)
But if I decompress this file first using qpdf, there's no error. Seems some sort of rare structure in the low-level.
Met another file with the same error 😢 Seems not so rare. test.pdf
Update: these files can be opened normally with pypdf2. Seems it's a bug introduced in pypdf4.
This bug is still valid ? cause I did merge those files with another
In [1]: from PyPDF4 import PdfFileReader
In [2]: test_pdf = PdfFileReader(open('test.pdf', 'rb'))
In [3]: One_pdf = PdfFileReader(open('1.pdf', 'rb'))
In [4]: one_pdf = PdfFileReader(open('1.pdf', 'rb'))
In [5]: numbers = PdfFileReader(open('101880043.pdf', 'rb'))
---------------------------------------------------------------------------
PdfReadError Traceback (most recent call last)
<ipython-input-5-dfaa3ab77f43> in <module>
----> 1 numbers = PdfFileReader(open('101880043.pdf', 'rb'))
~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in __init__(self, stream, strict, warndest, overwriteWarnings)
1146 stream = BytesIO(b_(fileobj.read()))
1147 fileobj.close()
-> 1148 self.read(stream)
1149 self.stream = stream
1150
~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in read(self, stream)
1964 continue
1965 # no xref table found at specified location
-> 1966 raise utils.PdfReadError("Could not find xref table at specified location")
1967 #if not zero-indexed, verify that the table is correct; change it if necessary
1968 if self.xrefIndex and not self.strict:
PdfReadError: Could not find xref table at specified location
In [6]: test_pdf.getNumPages()
Out[6]: 1
In [7]: one_pdf.getNumPages()
Out[7]: 1
A got an error using @jucajuca PDF sample, but the others seems to work .
Using ...
- Python 3.9
- PyPDF4==1.27.0
You may post some code samples ...