pdftools icon indicating copy to clipboard operation
pdftools copied to clipboard

feature request: 'pdf_toc()' returns more pdf bookmarks information

Open trevorld opened this issue 1 year ago • 3 comments

  • Currently I observe that pdf_toc() only returns the bookmark titles and the nesting hierarchy of bookmarks.
  • It would be nice if it could also return more bookmark attributes in addition to the title.
  • In particular it would be nice if we could get the page number that each bookmark goes to (when the bookmark action is to go to a page number). Currently I need to use a wrapper around the command-line tool pdftk to get that information.

trevorld avatar Sep 27 '22 20:09 trevorld

In case it is helpful here is a minimal pdf document with the following pdf bookmarks features presents:

  • Bookmarks starting open and bookmarks starting closed (integer count positive versus negative)
  • Bookmarks with different styles (i.e. plain, bold, italic, bold-italic)
  • Bookmarks with different colors

PDF attachment: bookmarks.pdf

Note many open-source pdf viewers quietly ignore some of these features. Foxit reader is an example of a cross-platform (but proprietary) pdf reader that supports all of these.

Here is the R code to create the minimal pdf with pdf bookmarks:

library("grid")
library("grDevices")
library("xmpdf") # remotes::install_github("trevorld/r-xmpdf")

stopifnot(supports_gs()) # needs 'ghostscript'

# Create two-page pdf
pdf("bookmarks.pdf", onefile = TRUE)
grid.text("Page 1")
grid.newpage()
grid.text("Page 2")
invisible(dev.off())

# Add bookmarks
bookmarks <- data.frame(title = c("Front", "Page 1", "Page 2"),
                        page = c(1L, 1L, 2L),
                        count = c(2L, -1L, 0),
                        fontface = c("italic", "bold", "bold.italic"),
                        color = c("black", "red", "blue"))
set_bookmarks_gs(bookmarks, "bookmarks.pdf")

Currently pdf_toc() seems to ignore most of this information:

pdftools::pdf_toc("bookmarks.pdf")
$title
[1] ""

$children
$children[[1]]
$children[[1]]$title
[1] "Front"

$children[[1]]$children
$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$title
[1] "Page 1"

$children[[1]]$children[[1]]$children
$children[[1]]$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$children[[1]]$title
[1] "Page 2"

$children[[1]]$children[[1]]$children[[1]]$children
list()

trevorld avatar Sep 30 '22 21:09 trevorld

I don't think poppler supports this right now, at least I can't find it in the API. I found this old post but it looks like it was never followed up on.

jeroen avatar Oct 04 '22 17:10 jeroen

Thanks for the explanation!

Looking at the poppler API documentation I guess besides the bookmark's title the only other information the API makes available is whether that bookmark should start open/closed in the TOC (i.e. is_open). No bookmark color, style, or page number (or other action) seems to be currently supported.

Feel free to close this issue but I'll leave it open since it seems you could still return the is_open data to pdf_toc(). pdftk currently doesn't return that bookmark info...

trevorld avatar Oct 04 '22 18:10 trevorld