pdfrw icon indicating copy to clipboard operation
pdfrw copied to clipboard

Preserve table of contents when editing PDF

Open NiklasKappel opened this issue 3 years ago • 1 comments

I use the following simple combination of PdfReader and PdfWriter to edit PDF files.

import pdfrw


def main():
    pdf = pdfrw.PdfReader(open("input.pdf", "rb"))
    writer = pdfrw.PdfWriter()
    for pageNumber in range(pdf.numPages):
        page = pdf.pages[pageNumber]
        # Work on page.
        writer.addpage(page)
    writer.write("output.pdf")


if __name__ == "__main__":
    main()

Where # Work on page. is a placeholder for code that sometimes does nothing to the page and sometimes merges the page with a different pre-prepared page.

If input.pdf contains a table of contents (in the form of metadata that can be displayed in the sidebar of pdf viewers and used to navigate the document), the table of contents is missing in output.pdf.

Is it possible to preserve the table of contents in this example using pdfrw? E.g. is it possible to extract the table of contents from input.pdf and paste it into output.pdf?

NiklasKappel avatar Feb 07 '22 16:02 NiklasKappel

Hi!

The way you do it you are creating a blank PDF document and adding pages to it. However, if you want to edit an existing document a much simpler way would be to omit the call to addPage altogether, edit the pages in pdf in-place, just as your code does, and in the end simply write the result out using:

PdfWriter('output.pdf', trailer=pdf).write()

sl2c avatar Sep 20 '23 09:09 sl2c