pypdf
pypdf copied to clipboard
throwing AssertionError while merging certain pdfs
I am trying to merge PDFs using PdfFileMerger. When I call the write method of PdfFileMerger, it is throwing AssertionError. I got to know it is coming from line assert "/Length" in data
in generic.py file. I am not able to understand why my certain pdfs have no '/Length' key in stream object.
AssertionError: null
File "reports/views.py", line 3643, in selectedPrints
merger.write(result)
File "PyPDF2/merger.py", line 230, in write
self.output.write(fileobj)
File "PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "PyPDF2/pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "PyPDF2/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "PyPDF2/generic.py", line 604, in readFromStream
assert "/Length" in data
assert "/Length" in data
The key is Length1
instead of Length
Anyone facing same issue can use the lib below to fix this. it will calculate length of stream object and add /Length
to stream object's header and rebuild xref table. https://github.com/shobhit99/ReconstructPdf
@shobhit789 Fixing the xref table is interesting! Would you mind if I integrated your script in https://github.com/py-pdf/cpdf ?
sure @MartinThoma but the implementation is specific to few pdf versions only, you can modify it if you want
@MartinThoma, what is your recommendation about this issue? should we close it?
Let's close it.
For people who face issues with broken PDFs: You can fix them in many ways https://superuser.com/a/282056