pdftools
pdftools copied to clipboard
feature request: 'pdf_toc()' returns more pdf bookmarks information
- Currently I observe that
pdf_toc()
only returns the bookmark titles and the nesting hierarchy of bookmarks. - It would be nice if it could also return more bookmark attributes in addition to the title.
- In particular it would be nice if we could get the page number that each bookmark goes to (when the bookmark action is to go to a page number). Currently I need to use a wrapper around the command-line tool
pdftk
to get that information.
In case it is helpful here is a minimal pdf document with the following pdf bookmarks features presents:
- Bookmarks starting open and bookmarks starting closed (integer count positive versus negative)
- Bookmarks with different styles (i.e. plain, bold, italic, bold-italic)
- Bookmarks with different colors
PDF attachment: bookmarks.pdf
Note many open-source pdf viewers quietly ignore some of these features. Foxit reader is an example of a cross-platform (but proprietary) pdf reader that supports all of these.
Here is the R code to create the minimal pdf with pdf bookmarks:
library("grid")
library("grDevices")
library("xmpdf") # remotes::install_github("trevorld/r-xmpdf")
stopifnot(supports_gs()) # needs 'ghostscript'
# Create two-page pdf
pdf("bookmarks.pdf", onefile = TRUE)
grid.text("Page 1")
grid.newpage()
grid.text("Page 2")
invisible(dev.off())
# Add bookmarks
bookmarks <- data.frame(title = c("Front", "Page 1", "Page 2"),
page = c(1L, 1L, 2L),
count = c(2L, -1L, 0),
fontface = c("italic", "bold", "bold.italic"),
color = c("black", "red", "blue"))
set_bookmarks_gs(bookmarks, "bookmarks.pdf")
Currently pdf_toc()
seems to ignore most of this information:
pdftools::pdf_toc("bookmarks.pdf")
$title
[1] ""
$children
$children[[1]]
$children[[1]]$title
[1] "Front"
$children[[1]]$children
$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$title
[1] "Page 1"
$children[[1]]$children[[1]]$children
$children[[1]]$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$children[[1]]$title
[1] "Page 2"
$children[[1]]$children[[1]]$children[[1]]$children
list()
I don't think poppler supports this right now, at least I can't find it in the API. I found this old post but it looks like it was never followed up on.
Thanks for the explanation!
Looking at the poppler API documentation I guess besides the bookmark's title the only other information the API makes available is whether that bookmark should start open/closed in the TOC (i.e. is_open
). No bookmark color, style, or page number (or other action) seems to be currently supported.
Feel free to close this issue but I'll leave it open since it seems you could still return the is_open
data to pdf_toc()
. pdftk
currently doesn't return that bookmark info...