Menotexport icon indicating copy to clipboard operation
Menotexport copied to clipboard

Export loses outlines

Open ghost opened this issue 7 years ago • 9 comments

Hi thanks for this great tool!

I'm trying to export my Mendeley library and noticing that my outlines/bookmarks are missing from the output. These are useful for me to navigate around my documents. Is it possible to retain these in the export? Thanks for your help.

ghost avatar Jun 08 '18 18:06 ghost

Hi apenewberry,

I didn't know about the outlines/bookmarks feature of Mendeley. I just updated my version to 1.19 which should be latest, but still I don't quite get it. Can you give me some more hint regarding the outlines you are referring to? What version are you using, is it windows or Mac or Linux?

Xunius avatar Jun 09 '18 01:06 Xunius

I'm using Mendeley Desktop 1.19 on Ubuntu 17.10.

I think they're called 'outlines' or 'bookmarks' in the PDF format, but Mendeley displays them in the Contents tab of the sidebar on the right. Screenshot below.

I've been looking at some resources, but haven't figured out how copy the outlines in:

  • https://pythonhosted.org/PyPDF2/PdfFileWriter.html#PyPDF2.PdfFileWriter.addBookmark
  • https://pythonhosted.org/PyPDF2/PdfFileReader.html#PyPDF2.PdfFileReader.getOutlines
  • https://github.com/psammetichus/PDF-Add-Outline/blob/master/addindex.py

image

ghost avatar Jun 09 '18 02:06 ghost

I see. So these are embedded in the PDFs aren't they? And they appear in more recent publications not in older ones. So how do you want them to be exported? Because the pdf files are merely copied to the target folder, if a pdf has these, the export should too.

Xunius avatar Jun 09 '18 04:06 Xunius

Ok I think I must've made a mistake. I'll look into this further and update.

ghost avatar Jun 09 '18 04:06 ghost

It looks like the script copies the bookmarks for most PDFs but misses them on (at least one) others. I'm not sure what the distinguishing characteristic is for when it misses them. I think I'll just deal with the few missed examples manually as they come up. Thanks for the tool and for responding so quickly as well -- cheers!

ghost avatar Jun 09 '18 07:06 ghost

Hmm it's strange that it fails for certain pdfs. Could it have something to do with the pdf viewer software?

Xunius avatar Jun 09 '18 10:06 Xunius

I'm not sure. Apparently there are multiple ways to embed bookmarks in PDFs, which can make reading them complicated, but it's getting over my head at that point.

ghost avatar Jun 09 '18 18:06 ghost

Hi, this is my quick solution which seems to work without problems:

# Copy the root (document catalog) except for /Pages
# PDF Reference, Sixth Edition, version 1.7, p.137
# https://www.adobe.com/devnet/pdf/pdf_reference_archive.html
for k,v in inpdf.trailer["/Root"].items():
    if k.getObject() != "/Pages":
        outpdf._root_object.update({k: v})
diff --git a/lib/exportpdf.py b/lib/exportpdf.py
index 5072681..aaffe23 100644
--- a/lib/exportpdf.py
+++ b/lib/exportpdf.py
@@ -154,6 +154,14 @@ def exportPdf(fin,outdir,annotations,verbose):
 
         outpdf.addPage(inpg)
 
+
+    # Copy the root (document catalog) except for /Pages
+    # PDF Reference, Sixth Edition, version 1.7, p.137
+    # https://www.adobe.com/devnet/pdf/pdf_reference_archive.html
+    for k,v in inpdf.trailer["/Root"].items():
+        if k.getObject() != "/Pages":
+            outpdf._root_object.update({k: v})
+
     #-----------------------Save-----------------------
     filename=annotations.filename
     if not os.path.isdir(outdir):

syu-id avatar Oct 02 '19 13:10 syu-id

@rongmu thanks for providing the fix. I've incorporated your code. As I'm not going to test it myself I've put the snippet in a try block.

Xunius avatar Oct 04 '19 07:10 Xunius