pypdf
pypdf copied to clipboard
Pages without Resources dictionary
Failure using mergePage() with pages that do not have a resource dictionary. This appears to be a valid condition, and the page should then inherit dictionary content from its parent. Trackback below:
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1536, in __call__
return self.func(*args)
File "pdfbind\view.py", line 153, in _on_execute_click
b.bind()
File "pdfbind\bind.py", line 176, in bind
page_bufs.append(page_header.merge(p, header_info))
File "pdfbind\header.py", line 68, in merge
header_page.mergePage(orig_page)
File "PyPDF2\pdf.py", line 2211, in mergePage
self._mergePage(page2)
File "PyPDF2\pdf.py", line 2221, in _mergePage
page2Resources = page2["/Resources"].getObject()
File "PyPDF2\generic.py", line 512, in __getitem__
return dict.__getitem__(self, key).getObject()
KeyError: '/Resources'
Could you possibly share the PDF(s) you're working with so I can take a closer look? PyPDF2 does (or is supposed to) support inheritance of missing page attributes from a parent.
Here's one of the files causing the problem. Starting with some other page from another document, then calling mergePage() with this PDF results in the above error. 108.pdf
While PyPDF2 does allow inheriting certain page attributes, It appears that the none of the page's parents contain the Resources dictionary either. It is a required entry, however I'll try to implement a workaround in strict=False
mode
Was this issue resolved?
Here's one of the files causing the problem. Starting with some other page from another document, then calling mergePage() with this PDF results in the above error. 108.pdf tested successfully :
p = PyPDF2.PdfReader("c:/108.pdf")
m = PyPDF2.PdfMerger()
m.append(p)
with open("c:/tt.pdf","wb") as f:
m.write(f)
issue can be closed
Thank you for checking @pubpub-zz :heart:
Maybe I'm missing something but it looks like this: https://github.com/py-pdf/PyPDF2/pull/1276 only fixes the _extract_text
function.
I'm still having issues with the _merge_page
function and this call: original_resources = cast(DictionaryObject, self[PG.RESOURCES].get_object())
when I have a page that is missing the \Resources
dict.
File "/site-packages/PyPDF2/_page.py", line 508, in merge_page
self._merge_page(page2, expand=expand)
File "/site-packages/PyPDF2/_page.py", line 532, in _merge_page
original_resources = cast(DictionaryObject, self[PG.RESOURCES].get_object())
File "/site-packages/PyPDF2/generic/_data_structures.py", line 149, in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/Resources'
@FredrikWallstrom Which version of PyPDF2 are you using?
Which version of PyPDF2 are you using?
2.10.8
@FredrikWallstrom to be sure to focus on the real problem, can you provide test file and code
thanks
PDF: 108.pdf
Stupid code example but the principle is the same:
reader = PdfReader(<108.pdf-stream>)
page_one = reader.pages[0]
page_two = reader.pages[0]
page_one.merge_page(page_two)
a good example improves analysis.Thanks
Should be good now