pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Pages without Resources dictionary

Open jvalenzuela opened this issue 8 years ago • 4 comments

Failure using mergePage() with pages that do not have a resource dictionary. This appears to be a valid condition, and the page should then inherit dictionary content from its parent. Trackback below:

Traceback (most recent call last):
  File "C:\Python27\lib\lib-tk\Tkinter.py", line 1536, in __call__
    return self.func(*args)
  File "pdfbind\view.py", line 153, in _on_execute_click
    b.bind()
  File "pdfbind\bind.py", line 176, in bind
    page_bufs.append(page_header.merge(p, header_info))
  File "pdfbind\header.py", line 68, in merge
    header_page.mergePage(orig_page)
  File "PyPDF2\pdf.py", line 2211, in mergePage
    self._mergePage(page2)
  File "PyPDF2\pdf.py", line 2221, in _mergePage
    page2Resources = page2["/Resources"].getObject()
  File "PyPDF2\generic.py", line 512, in __getitem__
    return dict.__getitem__(self, key).getObject()
KeyError: '/Resources'

jvalenzuela avatar Jun 23 '16 01:06 jvalenzuela

Could you possibly share the PDF(s) you're working with so I can take a closer look? PyPDF2 does (or is supposed to) support inheritance of missing page attributes from a parent.

mstamy2 avatar Jun 23 '16 17:06 mstamy2

Here's one of the files causing the problem. Starting with some other page from another document, then calling mergePage() with this PDF results in the above error. 108.pdf

jvalenzuela avatar Jun 24 '16 03:06 jvalenzuela

While PyPDF2 does allow inheriting certain page attributes, It appears that the none of the page's parents contain the Resources dictionary either. It is a required entry, however I'll try to implement a workaround in strict=False mode

mstamy2 avatar Jun 24 '16 16:06 mstamy2

Was this issue resolved?

sjacob90 avatar May 25 '22 04:05 sjacob90

Here's one of the files causing the problem. Starting with some other page from another document, then calling mergePage() with this PDF results in the above error. 108.pdf tested successfully :

p = PyPDF2.PdfReader("c:/108.pdf")
m = PyPDF2.PdfMerger()
m.append(p)
with open("c:/tt.pdf","wb") as f:
    m.write(f)

issue can be closed

pubpub-zz avatar Sep 03 '22 14:09 pubpub-zz

Thank you for checking @pubpub-zz :heart:

MartinThoma avatar Sep 06 '22 19:09 MartinThoma

Maybe I'm missing something but it looks like this: https://github.com/py-pdf/PyPDF2/pull/1276 only fixes the _extract_text function. I'm still having issues with the _merge_page function and this call: original_resources = cast(DictionaryObject, self[PG.RESOURCES].get_object()) when I have a page that is missing the \Resources dict.

  File "/site-packages/PyPDF2/_page.py", line 508, in merge_page
    self._merge_page(page2, expand=expand)
  File "/site-packages/PyPDF2/_page.py", line 532, in _merge_page
    original_resources = cast(DictionaryObject, self[PG.RESOURCES].get_object())
  File "/site-packages/PyPDF2/generic/_data_structures.py", line 149, in __getitem__
    return dict.__getitem__(self, key).get_object()
KeyError: '/Resources'

FredrikWallstrom avatar Sep 14 '22 14:09 FredrikWallstrom

@FredrikWallstrom Which version of PyPDF2 are you using?

MartinThoma avatar Sep 14 '22 14:09 MartinThoma

Which version of PyPDF2 are you using?

2.10.8

FredrikWallstrom avatar Sep 14 '22 15:09 FredrikWallstrom

@FredrikWallstrom to be sure to focus on the real problem, can you provide test file and code

thanks

pubpub-zz avatar Sep 14 '22 21:09 pubpub-zz

PDF: 108.pdf

Stupid code example but the principle is the same:

    reader = PdfReader(<108.pdf-stream>)
    page_one = reader.pages[0]
    page_two = reader.pages[0]
    page_one.merge_page(page_two)

FredrikWallstrom avatar Sep 15 '22 05:09 FredrikWallstrom

a good example improves analysis.Thanks

Should be good now

pubpub-zz avatar Sep 15 '22 21:09 pubpub-zz