pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

throwing AssertionError while merging certain pdfs

Open prafful-solanki opened this issue 5 years ago • 5 comments

I am trying to merge PDFs using PdfFileMerger. When I call the write method of PdfFileMerger, it is throwing AssertionError. I got to know it is coming from line assert "/Length" in data in generic.py file. I am not able to understand why my certain pdfs have no '/Length' key in stream object.

prafful-solanki avatar Apr 10 '19 13:04 prafful-solanki

AssertionError: null
  File "reports/views.py", line 3643, in selectedPrints
    merger.write(result)
  File "PyPDF2/merger.py", line 230, in write
    self.output.write(fileobj)
  File "PyPDF2/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "PyPDF2/pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "PyPDF2/generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "PyPDF2/generic.py", line 604, in readFromStream
    assert "/Length" in data

Somnath-Guthula avatar Apr 16 '19 07:04 Somnath-Guthula

assert "/Length" in data

The key is Length1 instead of Length

Somnath-Guthula avatar Apr 16 '19 07:04 Somnath-Guthula

Anyone facing same issue can use the lib below to fix this. it will calculate length of stream object and add /Length to stream object's header and rebuild xref table. https://github.com/shobhit99/ReconstructPdf

shobhit789 avatar Jan 06 '21 06:01 shobhit789

@shobhit789 Fixing the xref table is interesting! Would you mind if I integrated your script in https://github.com/py-pdf/cpdf ?

MartinThoma avatar Jun 27 '22 19:06 MartinThoma

sure @MartinThoma but the implementation is specific to few pdf versions only, you can modify it if you want

shobhit99 avatar Jun 28 '22 15:06 shobhit99

@MartinThoma, what is your recommendation about this issue? should we close it?

pubpub-zz avatar Feb 09 '23 05:02 pubpub-zz

Let's close it.

For people who face issues with broken PDFs: You can fix them in many ways https://superuser.com/a/282056

MartinThoma avatar Feb 10 '23 07:02 MartinThoma