Trouble getting bookmarks from a pdf document
Discussed in https://github.com/UglyToad/PdfPig/discussions/735
Originally posted by BSevault November 20, 2023 Hello.
I'm working with an internal pdf which is generated programatically, and I'm having difficulties getting its bookmarks using PdfPig. But I managed to get bookmarks from other pdf documents. I need to get the bookmarks and the pages they point to in order to split the original pdf in multiple pdf according to the bookmarks. When I debug my code, when I use TryGetBookmarks on my PdfDocument, it returns true but the Bookmarks contains nothing : it's length is 0.
Previously, I managed to get Bookmarks of the same pdf with itext7 but I caanot use it anymore due to licence issues.
My guess is that it is related to the way outlines and bookmarks are formed in the internal pdf's structure.
I tried to get the object containing "/Title" in the pdf but I failed.
Does anyone have an idea on how to get it done ?
Here is an extract of the pdf bookmarks objects :
2 0 obj
<<
/Type/Catalog
/Pages 3 0 R
/Outlines 22048 0 R
>>
endobj
...
...
...
22048 0 obj
<<
/Count 304
/First 22049 0 R
/Last 22201 0 R
>>
endobj
22049 0 obj
<<
/Title(TITLE0)
/Parent 22048 0 R
/Next 22201 0 R
/First 22050 0 R
/Last 22200 0 R
/Count 151
>>
endobj
22050 0 obj
<<
/Title(SUBTITLE1 /Page 1)
/A 22353 0 R
/Parent 22049 0 R
/Next 22051 0 R
>>
endobj
22353 0 obj
<<
/S/GoTo
/D[8 0 R /Fit]
>>
endobj
22051 0 obj
<<
/Title(SUBTITLE2 /Page 138)
/A 22354 0 R
/Parent 22049 0 R
/Next 22052 0 R
/Prev 22050 0 R
>>
endobj
22354 0 obj
<<
/S/GoTo
/D[338 0 R /Fit]
>>
... and so on...
Hi @BSevault, thanks for creating the issue. Would you be able to share a sample document, and maybe a snipet of how you extract rhe bookmarks?
Thanks for your answer, @BobLd.
Unfortunately, I cannot share the document I'm working on since it's confidential. I can only share an extract of the internal structure of the PDF.
The code I used to get bookmarks:
PdfDocument pdfDocument = new PdfDocument("path/to/my/document.pdf");
bool hasBookmarks = pdfDocument.TryGetBookmarks(out Bookmarks bookmarks);
hasBookmarks is true, but bookmarks.Roots.Count = 0.
I tried to get the tokens I need using PdfTokenScanner, but no success.
I tried looking into this but without the document I'm getting nowhere. I have pushed a PowerShell script to enable setting a single target framework:
path\to\clonedrepo\tools> .\set-dotnet-version.ps1
This will make it easier to build and run the project locally. Without access to the source file the only approach I can think of is debug it locally where you can access the file, see LocalTests for running against a single file from the file system https://github.com/UglyToad/PdfPig/blob/master/src/UglyToad.PdfPig.Tests/Integration/LocalTests.cs