PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

Trouble getting bookmarks from a pdf document

Open BSevault opened this issue 2 years ago • 3 comments

Discussed in https://github.com/UglyToad/PdfPig/discussions/735

Originally posted by BSevault November 20, 2023 Hello.

I'm working with an internal pdf which is generated programatically, and I'm having difficulties getting its bookmarks using PdfPig. But I managed to get bookmarks from other pdf documents. I need to get the bookmarks and the pages they point to in order to split the original pdf in multiple pdf according to the bookmarks. When I debug my code, when I use TryGetBookmarks on my PdfDocument, it returns true but the Bookmarks contains nothing : it's length is 0.

Previously, I managed to get Bookmarks of the same pdf with itext7 but I caanot use it anymore due to licence issues.

My guess is that it is related to the way outlines and bookmarks are formed in the internal pdf's structure.

I tried to get the object containing "/Title" in the pdf but I failed.

Does anyone have an idea on how to get it done ?

Here is an extract of the pdf bookmarks objects :

2 0 obj

<< 

/Type/Catalog

/Pages 3 0 R

/Outlines 22048 0 R

>>

endobj

...

...

...

22048 0 obj

<< 

/Count 304

/First 22049 0 R

/Last 22201 0 R

>> 

endobj

22049 0 obj

<< 

/Title(TITLE0)

/Parent 22048 0 R

/Next 22201 0 R

/First 22050 0 R

/Last 22200 0 R

/Count 151

>> 

endobj

22050 0 obj

<< 

/Title(SUBTITLE1 /Page 1)

/A 22353 0 R

/Parent 22049 0 R

/Next 22051 0 R


>> 

endobj

22353 0 obj

<< 

/S/GoTo

/D[8 0 R /Fit]

>> 

endobj

22051 0 obj

<< 

/Title(SUBTITLE2 /Page 138)

/A 22354 0 R

/Parent 22049 0 R

/Next 22052 0 R

/Prev 22050 0 R


>> 

endobj

22354 0 obj

<< 

/S/GoTo

/D[338 0 R /Fit]

>> 

... and so on...

BSevault avatar Nov 20 '23 13:11 BSevault

Hi @BSevault, thanks for creating the issue. Would you be able to share a sample document, and maybe a snipet of how you extract rhe bookmarks?

BobLd avatar Nov 20 '23 20:11 BobLd

Thanks for your answer, @BobLd.

Unfortunately, I cannot share the document I'm working on since it's confidential. I can only share an extract of the internal structure of the PDF.

The code I used to get bookmarks:

PdfDocument pdfDocument = new PdfDocument("path/to/my/document.pdf");
bool hasBookmarks = pdfDocument.TryGetBookmarks(out Bookmarks bookmarks);

hasBookmarks is true, but bookmarks.Roots.Count = 0.

I tried to get the tokens I need using PdfTokenScanner, but no success.

BSevault avatar Nov 21 '23 08:11 BSevault

I tried looking into this but without the document I'm getting nowhere. I have pushed a PowerShell script to enable setting a single target framework:

path\to\clonedrepo\tools> .\set-dotnet-version.ps1

This will make it easier to build and run the project locally. Without access to the source file the only approach I can think of is debug it locally where you can access the file, see LocalTests for running against a single file from the file system https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Tests/Integration/LocalTests.cs

EliotJones avatar Jan 10 '24 21:01 EliotJones