qpdf icon indicating copy to clipboard operation
qpdf copied to clipboard

How not to lose meta tags when deleting bookmarks

Open chrisrex opened this issue 2 years ago • 3 comments

Hi, if i use qpdf --empty --linearize --pages cover.pdf infile.pdf 43-85 -- output1.pdf all Metatags are gone. The title, author and the document language How can I keep this information? I have to delete the bookmarks without deleting metatags. Best regards Christoph

chrisrex avatar May 05 '23 07:05 chrisrex

There isn't an easy way to do this with qpdf right now. It may be possible to do with qpdf json by manipulating the files at a low level, but it would require fairly deep knowledge of PDF.

This will be possible to do once I get through my "pages epic" which I will start on when I get through my current significant project. The pages epic has been in my head for years and is next up for qpdf.

Note that qpdf preserves metadata and outlines and everything else from the "main" file, so with qpdf in its current form, there is no way from the cli to preserve metadata from all the files. If there's one file you want to preserve metadata from, list that first. For example, qpdf cover.pdf --linearize --pages . infile.pdf 43-85 -- output1.pdf would preserve everything from cover.pdf but would drop metadata from infile.pdf. Or you could do qpdf infile.pdf --linearize --pages cover.pdf . 43-85 -- output1.pdf to preserve stuff from infile.pdf. (The . inside the --pages option is short-hand for the main input file. You can also just repeat the name of the input file.)

At this point, you will have a linearized file but it will still contain bookmarks from the file that you preserved metadata from. If you want to remove bookmarks, you could try this. This assumes you have jq installed.

#!/bin/bash
set -e
infile=$1
outfile=$2
root=$(qpdf --json-output $infile - | jq -r '.qpdf[1].trailer.value."/Root"')
qpdf --json-output $infile - --json-object="$root" | \
    jq '.qpdf[1]."obj:'$root'".value."/Outlines" = null' > $infile-update.json
qpdf --linearize --update-from-json=$infile-update.json $infile $outfile

Run the above script with output1.pdf output2.pdf as options. You can drop --linearize from the first command since it is specified in this script. The script basically just snips out the bookmarks and reruns the result through qpdf to prune unused objects and re-linearize the file.

Hopefully this can help in the interim.

jberkenbilt avatar May 05 '23 11:05 jberkenbilt

Thank you very much, this is a good idea. I can add the metadata to the cover file. Then I everything is ok. Best regards Christoph

chrisrex avatar May 05 '23 14:05 chrisrex

This issue is part of the qpdf pages epic. If you are interested in following, please see #1104.

jberkenbilt avatar Jan 04 '24 18:01 jberkenbilt