Using the script changes the PDF version from 1.5 to 1.3
I tried to use the script to change the page labels. This works perfectly but manages to change the PDF version rendering the final version incompatible with a lot of archives.
edit: On digging further, I believe this is a problem with the pdfwriter python module.
https://github.com/pmaupin/pdfrw/blob/master/pdfrw/pdfwriter.py
It can only handle writing PDF 1.3.
Perhaps a warning can be placed in the readme.
PDF 1.3 is almost 20 years old, what software are you using that does not support this version of PDF ? Is it common ?
EDIT: Sorry, I did not see the problem was that the PDF version went from 1.5 to 1.3.
Have you considered using pikepdf instead of pdfrw? I just used it (from interactive Python prompt) to edit the Pagelabels and to set PageLayout = /TwoPageRight and it seemed fairly well made and writes PDF version 1.7.
GNU Ghostscript can convert PDF 1.3 to PDF 1.7.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Have you considered using pikepdf instead of pdfrw? I just used it (from interactive Python prompt) to edit the Pagelabels and to set PageLayout = /TwoPageRight and it seemed fairly well made and writes PDF version 1.7.
Can you share your code?
Because docs say:
There is currently no API to help with modifying the pdf.Root.PageLabels data structure, which contains the label definitions.
Because docs say:
There is currently no API to help with modifying the pdf.Root.PageLabels data structure, which contains the label definitions.
It's been a couple years since I did that and I seem to recall just following along with the docs. ~~Maybe that was experimental functionally that got removed?~~ [Edit: Nope, it is still there (see below). The docs are misleading. What they should say is that editing the PageLabels is so easy that there is no need for a special API.]
The recipe is documented at:
https://pikepdf.readthedocs.io/en/latest/api/models.html#pikepdf.NumberTree
Here is a complete working example:
from pikepdf import open as Pdfopen, Name, Dictionary, NumberTree
pdf=Pdfopen("input.pdf")
try:
pdf.Root.PageLabels
except:
nt = NumberTree.new(pdf)
pdf.Root.PageLabels = nt.obj
pagelabels = NumberTree(pdf.Root.PageLabels)
# Label pages starting at 0 with lowercase Roman numerals
pagelabels[0] = Dictionary(S=Name.r)
# Label pages starting at 6 with decimal numbers
pagelabels[6] = Dictionary(S=Name.D)
pdf.save('output.pdf')
# Page labels will now be:
# i, ii, iii, iv, v, 1, 2, 3, ...
And here are the settings you can use for the Dictionary in pagelabels (copied from the answer to how to use qpdf to directly edit a PDF's page numbers in a text editor, so you'll have to interpolate a little):
OPTIONAL: STARTING FROM A DIFFERENT NUMBER WITH /St
Each section restarts numbering at 1 unless you tell it otherwise using /St.
OPTIONAL: USING A DIFFERENT STYLE WITH /S
The /S operator takes an argument that lets you pick the numbering style,
- /D digits (1, 2, 3...)
- /R uppercase Roman (I, II, III...)
- /r lowercase Roman (i, ii, iii...)
- /A uppercase alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
- /a lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)
If one omits the /S operator, then that section of pages will have no numbering. For example:
0 << >> % No label for cover
OPTIONAL: ADDING A PREFIX TO EACH PAGE WITH /P
You can show any string of text before the page number by specifying a word in parentheses after /P:
31
<<
/S /D
/P (A-) % label appendix pages A-1, A-2, A-3
>>
Specifying a prefix without a style (/S), will give you pages that have only the word without any number. This can be useful, for example, if you'd like a cover page to simply have the label "Cover".
0 << /P (Cover) >> % No number, just "Cover"
You know what, I'll just make a proper answer on StackExchange: https://superuser.com/a/1809284/400780
@hackerb9 Thank you