pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

KeyError '/Pages' is raised when pages are merged

Open euter opened this issue 9 years ago • 6 comments

when I merge two pdf files with pdf version 1.3, the follow KeyError is raised.

Traceback (most recent call last):
  File "pdf_merge.py", line 56, in <module>
    merger.merge()
  File "pdf_merge.py", line 29, in merge
    self._merger.append(open(temp, "rb"))
  File "...\Python35\lib\site-packages\PyPDF2\merger.py", line 203, in append
    self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
  File "...\Python35\lib\site-packages\PyPDF2\merger.py", line 139, in merge
    pages = (0, pdfr.getNumPages())
  File "...\Python35\lib\site-packages\PyPDF2\pdf.py", line 1155, in getNumPages
    self._flatten()
  File "...\Python35\lib\site-packages\PyPDF2\pdf.py", line 1506, in _flatten
    pages = catalog["/Pages"].getObject()
  File "...\Python35\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__
    return dict.__getitem__(self, key).getObject()
KeyError: '/Pages'

euter avatar Feb 16 '17 13:02 euter

same error @euter any resolution?

b0g3r avatar Apr 18 '18 17:04 b0g3r

Okey, my bad, i just called merger.append() after merger.write()

b0g3r avatar Apr 19 '18 07:04 b0g3r

Same error here! I could'nt find anything about this. Any solution?

joaopedroguimaraes avatar Jan 07 '20 14:01 joaopedroguimaraes

As it's been a long time and I don't have neither an example PDF nor example code, I'll close this. If anybody still runs into this issue with the latest PyPDF2 version, please let me know!

MartinThoma avatar Jun 28 '22 20:06 MartinThoma

I'm having the same issue. I have a copy of the PDF that's causing the issue.

UTA_OSHA_3115_Fall_Protection_Training_09162021_.pdf

For some context. I'm using requests to retrieve a PDF from a url, then I copy the the response content into a buffer and from there I attempt to merger the pdf using merger.append. Previously I was saving the file to a temporary location, later I refactored it. This works for 99% of the PDFs I'm merging, but there are 4 PDFs I'm working with that consistently trigger this error.

red-shift avatar Jul 07 '22 17:07 red-shift

Thanks for adding an example @red-shift !

MCVE: Code + PDF

UTA_OSHA_3115_Fall_Protection_Training_09162021_.pdf

from PyPDF2 import PdfReader

reader = PdfReader("UTA_OSHA_3115_Fall_Protection_Training_09162021_.pdf")

print(len(reader.pages))

MartinThoma avatar Jul 07 '22 19:07 MartinThoma

Problem analyzed: the 1st xref is correctly read, but the "/Prev" one is starting at 1. when the first object are checked an offset in the index is detected and then applied to the whole table (by _zero_xref). This is damaging the 1st xref entries inducing the problem. The correction shall be applied object per object. PR in progress

pubpub-zz avatar Sep 03 '22 14:09 pubpub-zz