pypdf PdfFileWriter addBookmark DictionaryObject issue

PdfFileWriter addBookmark DictionaryObject issue

Open LightningMan711 opened this issue 8 years ago • 4 comments

I have a bookmark level swapping program that I wrote using PdfFileMerge but I would like to use PdfFileWriter instead, since it allows me to target bookmarks to a specific area on the page, which is how the original, unswapped bookmarks are targeted.

I tried using the revision of cloneDocumentFromReader at this link to retain the original tree (which for my purposes needs to be done) and while that worked fine (no empty pages), addBookmark when given a dummy bookmark (and not all the data I really want to pass) throws an error.

The code:

#Reading the original file
original = PdfFileReader(file("[redacted].pdf", "rb"))

#Creating a dictionary of page number meanings
decode = MapPDFPageNum(original)

#Getting the bookmarks
stacker = unravel(original.getOutlines(),decode)

#Creating the By Domain set
newBk = swap(stacker,2)


#I would like to use PdfFileWriter so as to aim destinations but I cannot
#figure out the syntax.
#--------
new = PdfFileWriter()
#the code I would use to add the bookmarks
#for bk in newBk:
    #print bk[1]
    #new.addBookmark("test",0)
    #if len(bk[7])<1:
        #print bk[1] + " " + str(bk[4]) + bk[2]
        #new.addBookmark(bk[1],bk[4])
    #else:
        #new.addBookmark(bk[1],bk[4],bk[7])
new.cloneDocumentFromReader(original)
#The test bookmark
new.addBookmark("test",0)
new.setPageMode("/UseOutlines")
outputStream = file("Clone.pdf","wb")
new.write(outputStream)
outputStream.close()

The error I get is this:

Traceback (most recent call last):
  File "C:\Users\[redacted]\BMTester2.py", line 117, in <module>
    new.addBookmark("test",0)
  File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 848, in addBookmark
    parent.addChild(bookmarkRef, self)
AttributeError: 'DictionaryObject' object has no attribute 'addChild'

It appears that the addBookmark code is expecting a TreeObject but parent is a DictionaryObject .

The addBookmark method works fine when there are no existing bookmarks. It's only when there is an existing tree that this is an issue, and, as you recall, that is the point, to retain the original bookmarks.

Any help would be appreciated.

May 23 '16 17:05 LightningMan711

The support for cloning and editing seems to be rudimentary at best. The error you get is because the PDF file contains an Outlines tree, which is represented as a DictionaryObject, not a TreeObject. During reading, one cannot decide whether a Dictionary is a Tree or not because all attributes are optional for the leaves. I've tried to fix that by walking the outlines tree and changing the class of each node to a Tree in PdfFileWriter.getOutlineRoot.

if not isinstance(outline, TreeObject):
    def _walk(node):
        node.__class__ = TreeObject
        for child in node.children():
            _walk(child)
    _walk(outline)

Worse, all indirect references that link the tree nodes together are still pointing to the reader that was cloned from. The objects themselves are not part of the writer's _objects and one cannot obtain references to them, which prevents the creation of new tree nodes. But before writing, all objects are copied and the indirect references rewritten so that they point to the new objects. As an ugly workaround, we can thus call write for its side effects:

new.write(BytesIO())
new.addBookmark("test", 0) # works now

Aug 16 '17 18:08 rwirth

So write to BytesIO, add the bookmark, then write to outputStream, correct?

Aug 29 '17 20:08 LightningMan711

Okay, this almost worked, so let me tell you what I did to make it work. First, I put your getOutlineRoot code in below everything in getOutlineRoot except the return:

       # start here
        if not isinstance(outline, TreeObject):
            def _walk(node):
                node.__class__ = TreeObject
                for child in node.children():
                    _walk(child)
            _walk(outline)
        # end here

I then added the new.write(BytesIO()) code to my script (after importing BytesIO from io):

new = PdfFileWriter()
new.cloneDocumentFromReader(original)
new.write(BytesIO())
#new.addBookmark("test",0)

It wrote the test bookmark fine. But when I un-commented the bookmark swapping code in the original post to create the swapped nested bookmarks, it couldn't do the parent.addChild function, since it was being passed a unicode string and not the bookmark object itself. So back in pdf.py I altered the addBookmark text in two ways. I created an exception for a root level bookmark to write as originally written, but otherwise, I had it drill down the outline until it found the bookmark it was looking for and then wrote the child bookmark.

The new recursive function (defined inside of the addBookmark definition, just below the parameter explanation:

        # New function to drill recursively for bookmarks
        def drillDown(dictObj, daddy, tuck):
             huntObj = dictObj
             for kidObj in huntObj:
                  if daddy in kidObj.itervalues():
                       kidObj.addChild(tuck, self)
                  else:
                       drillDown(kidObj, daddy, tuck)

And the rewritten parent section at the bottom of the addBookmark defnition:

        # Added by me
        if parent != outlineRef:
            drillDown(outlineRef, parent, bookmarkRef)
        else:
            parent = parent.getObject()
            parent.addChild(bookmarkRef, self)

        return bookmarkRef

If you have any questions, please ask.

Aug 30 '17 05:08 LightningMan711

@LightningMan711 Do you know if there is anything left to do for this one? What exactly is the issue?

Jul 09 '22 14:07 MartinThoma

I assume the issue was solved. Please comment if it still exists with recent pypdf versions :-)

Mar 23 '23 05:03 MartinThoma

Still have this issue with pypdf 3.15.5

https://pastebin.com/AgnYe8zu Line 51

AttributeError: 'DictionaryObject' object has no attribute 'insert_child'

Oct 01 '23 15:10 Firestar-Reimu

Please open a new issue with example code and a reproducing PDF file as well as the full traceback.

Oct 01 '23 15:10 stefan6419846

pypdf pypdf copied to clipboard

PdfFileWriter addBookmark DictionaryObject issue

pypdf
pypdf copied to clipboard