pypdf
pypdf copied to clipboard
PdfFileWriter addBookmark DictionaryObject issue
I have a bookmark level swapping program that I wrote using PdfFileMerge but I would like to use PdfFileWriter instead, since it allows me to target bookmarks to a specific area on the page, which is how the original, unswapped bookmarks are targeted.
I tried using the revision of cloneDocumentFromReader at this link to retain the original tree (which for my purposes needs to be done) and while that worked fine (no empty pages), addBookmark when given a dummy bookmark (and not all the data I really want to pass) throws an error.
The code:
#Reading the original file
original = PdfFileReader(file("[redacted].pdf", "rb"))
#Creating a dictionary of page number meanings
decode = MapPDFPageNum(original)
#Getting the bookmarks
stacker = unravel(original.getOutlines(),decode)
#Creating the By Domain set
newBk = swap(stacker,2)
#I would like to use PdfFileWriter so as to aim destinations but I cannot
#figure out the syntax.
#--------
new = PdfFileWriter()
#the code I would use to add the bookmarks
#for bk in newBk:
#print bk[1]
#new.addBookmark("test",0)
#if len(bk[7])<1:
#print bk[1] + " " + str(bk[4]) + bk[2]
#new.addBookmark(bk[1],bk[4])
#else:
#new.addBookmark(bk[1],bk[4],bk[7])
new.cloneDocumentFromReader(original)
#The test bookmark
new.addBookmark("test",0)
new.setPageMode("/UseOutlines")
outputStream = file("Clone.pdf","wb")
new.write(outputStream)
outputStream.close()
The error I get is this:
Traceback (most recent call last):
File "C:\Users\[redacted]\BMTester2.py", line 117, in <module>
new.addBookmark("test",0)
File "C:\Python27\lib\site-packages\PyPDF2\pdf.py", line 848, in addBookmark
parent.addChild(bookmarkRef, self)
AttributeError: 'DictionaryObject' object has no attribute 'addChild'
It appears that the addBookmark code is expecting a TreeObject but parent is a DictionaryObject .
The addBookmark method works fine when there are no existing bookmarks. It's only when there is an existing tree that this is an issue, and, as you recall, that is the point, to retain the original bookmarks.
Any help would be appreciated.
The support for cloning and editing seems to be rudimentary at best. The error you get is because the PDF file contains an Outlines tree, which is represented as a DictionaryObject
, not a TreeObject
. During reading, one cannot decide whether a Dictionary is a Tree or not because all attributes are optional for the leaves. I've tried to fix that by walking the outlines tree and changing the class of each node to a Tree in PdfFileWriter.getOutlineRoot
.
if not isinstance(outline, TreeObject):
def _walk(node):
node.__class__ = TreeObject
for child in node.children():
_walk(child)
_walk(outline)
Worse, all indirect references that link the tree nodes together are still pointing to the reader that was cloned from. The objects themselves are not part of the writer's _objects
and one cannot obtain references to them, which prevents the creation of new tree nodes. But before writing, all objects are copied and the indirect references rewritten so that they point to the new objects. As an ugly workaround, we can thus call write
for its side effects:
new.write(BytesIO())
new.addBookmark("test", 0) # works now
So write to BytesIO, add the bookmark, then write to outputStream, correct?
Okay, this almost worked, so let me tell you what I did to make it work. First, I put your getOutlineRoot code in below everything in getOutlineRoot except the return:
# start here
if not isinstance(outline, TreeObject):
def _walk(node):
node.__class__ = TreeObject
for child in node.children():
_walk(child)
_walk(outline)
# end here
I then added the new.write(BytesIO()) code to my script (after importing BytesIO from io):
new = PdfFileWriter()
new.cloneDocumentFromReader(original)
new.write(BytesIO())
#new.addBookmark("test",0)
It wrote the test bookmark fine. But when I un-commented the bookmark swapping code in the original post to create the swapped nested bookmarks, it couldn't do the parent.addChild function, since it was being passed a unicode string and not the bookmark object itself. So back in pdf.py I altered the addBookmark text in two ways. I created an exception for a root level bookmark to write as originally written, but otherwise, I had it drill down the outline until it found the bookmark it was looking for and then wrote the child bookmark.
The new recursive function (defined inside of the addBookmark definition, just below the parameter explanation:
# New function to drill recursively for bookmarks
def drillDown(dictObj, daddy, tuck):
huntObj = dictObj
for kidObj in huntObj:
if daddy in kidObj.itervalues():
kidObj.addChild(tuck, self)
else:
drillDown(kidObj, daddy, tuck)
And the rewritten parent section at the bottom of the addBookmark defnition:
# Added by me
if parent != outlineRef:
drillDown(outlineRef, parent, bookmarkRef)
else:
parent = parent.getObject()
parent.addChild(bookmarkRef, self)
return bookmarkRef
If you have any questions, please ask.
@LightningMan711 Do you know if there is anything left to do for this one? What exactly is the issue?
I assume the issue was solved. Please comment if it still exists with recent pypdf
versions :-)
Still have this issue with pypdf 3.15.5
https://pastebin.com/AgnYe8zu Line 51
AttributeError: 'DictionaryObject' object has no attribute 'insert_child'
Please open a new issue with example code and a reproducing PDF file as well as the full traceback.