qpdf icon indicating copy to clipboard operation
qpdf copied to clipboard

flatten

Open ggrothendieck opened this issue 5 months ago • 4 comments

qpdf has the ability to flatten a pdf.

> qpdf.exe --help=--flatten-annotations
--flatten-annotations=parameter

Push page annotations into the content streams. This may be
necessary in some case when printing or splitting files.
Parameters: "all", "print", "screen".

Suggest making this available through the qpdf R package.

ggrothendieck avatar Jul 01 '25 10:07 ggrothendieck

Can you send a PR?

jeroen avatar Jul 01 '25 10:07 jeroen

This is what i am using now.

#' Flatten pdf
#' qpdf command line program must be installed
#'
#' @param input pdf path 
#' @param output pdf path (defaults to replacing .pdf in input with -flat.pdf)
#' @param path to qpdf executable (defaults to "qpdf")
#' @param verbose if set to TRUE displays qpdf command that is run
qpdf_flatten <- function(input, output, qpdf = "qpdf", verbose = FALSE) {
  if (missing(output) || is.null(output)) {
    output <- sub("\\.pdf$", "-flat.pdf", input)
  }
  output <- normalizePath(output, mustWork = FALSE)
  cmd <- sprintf('"%s" --flatten-annotations=all "%s" "%s"', qpdf, input, output)
  if (verbose) message(cmd)
  system(cmd)
}

ggrothendieck avatar Jul 01 '25 15:07 ggrothendieck

  • I note besides --flatten-annotations that qpdf also has a separate --flatten-rotation feature so a new function name pdf_flatten() may be ambiguous (unless it supports both of these operations).

  • I'm not entirely sure yet what the cleanest way to add these features to the API may be:

    1. Maybe add them as new options to pdf_compress() to extend it as a sort of generic "clean/standardize" pdf file function that otherwise doesn't alter the contents unlike the combine/split/subset/rotate functions.
    2. Add as a new function name (or two) that cleanly does one feature but then you may need to chain multiple functions (and read/write multiple pdf files) to do what qpdf can do in one read/write step.
    3. Also add as options to additional functions in the API (in particular --flatten-rotation could maybe be added to pdf_rotate_pages() as well as pdf_compress() or pdf_flatten()).

I've previously noticed that if you rotate a page with pdf_rotate_page() and then query the size with pdftools::pdf_pagesize() it returns the non-rotated dimensions instead of the rotated dimensions. I think this is a bug with the poppler library (these files usually rotate fine when viewed in a pdf viewer). I speculate that the --flatten-rotation feature may help with this but I haven't tested this out yet. This is non-urgent for me because processing the pdf file with ghostscript seems to "fix" this issue.

trevorld avatar Jul 01 '25 20:07 trevorld

I guess technically qpdf considers compression and linearization to be a PDF Transformation that

transformations that change the structure of a PDF file without changing its content

whereas flattening annotations and rotations are considered by qpdf to be a PDF modification that

Modification options make systematic changes to certain parts of the PDF, causing the PDF to render differently from the original.

I also observe in the docs that --flatten-annotations is sometimes combined with --generate-appearances.

Note that there is usually no reason to do this, but it can be necessary before using the --flatten-annotations option.

There are also other potentially "interesting" "modification" options like --remove-info, --remove-page-labels, --remove-metadata, --remove-structure to strip data.

trevorld avatar Jul 01 '25 22:07 trevorld