pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

mergePage is not overlapping all pages properly since update to 2.10 from 1.28.4

Open jesusestevez opened this issue 1 year ago • 3 comments

I am replicating the process posted by ofir dubi here: https://stackoverflow.com/questions/25164871/how-to-add-page-numbers-to-pdf-file-using-python-and-matplotlib

So far, I decided to stick to PyPDF2 instead of PyPDF4. It was working properly until the most recent update to v2.10 (from 1.28.4). However, now only the first page is merged, being the number replicated in all the others.

The process is still working if I use PyPDF4. However, I would like to know if it can be solved using PyPDF2 instead

jesusestevez avatar Aug 15 '22 11:08 jesusestevez

@jesusestevez Please use the template for bugs in future

Can you please share a minimal example that shows the issue (code + pdf)?

MartinThoma avatar Aug 15 '22 11:08 MartinThoma

The following code adds page numbers to a PDF just as expected:

import os

from PyPDF2 import PdfReader, PdfWriter
from reportlab.lib.units import mm
from reportlab.pdfgen import canvas


def create_page_pdf(num, tmp):
    c = canvas.Canvas(tmp)
    for i in range(1, num + 1):
        c.drawString((210 // 2) * mm, (4) * mm, str(i))
        c.showPage()
    c.save()


def add_page_numgers(pdf_path, newpath):
    """
    Add page numbers to a pdf, save the result as a new pdf
    @param pdf_path: path to pdf
    """
    tmp = "__tmp.pdf"

    writer = PdfWriter()
    with open(pdf_path, "rb") as f:
        reader = PdfReader(f)
        n = len(reader.pages)

        # create new PDF with page numbers
        create_page_pdf(n, tmp)

        with open(tmp, "rb") as ftmp:
            number_pdf = PdfReader(ftmp)
            # iterarte pages
            for p in range(n):
                page = reader.pages[p]
                number_layer = number_pdf.pages[p]
                # merge number page with actual page
                page.merge_page(number_layer)
                writer.add_page(page)

            # write result
            if len(writer.pages) > 0:
                with open(newpath, "wb") as f:
                    writer.write(f)
        os.remove(tmp)


if __name__ == "__main__"
    add_page_numgers("input.pdf", "output.pdf")

MartinThoma avatar Aug 15 '22 12:08 MartinThoma

Thank you very much for your swift response, and sorry for not having used the template.

I am using a the add_page_numbers() function described above to, instead of add numbers, overlap 2 documents. Both contain the same information, but one of them is a blank version where I have a table of contents with links to different pages inside the document (i.e. the table of contents contain a set of internal links). The idea is that, after overlapping them, the pdf with all the text filled has the internal links of the second pdf. However, since the update this procedure is not working, and the first link is applied to all pages.

I amended the function to include merge_page, add_page and the updated Writed and Reader objects, but cannot make it work. The main reason to log this as an issue is that in my environment with PyPDF2 v.1.28.4 is working correctly.

jesusestevez avatar Aug 15 '22 12:08 jesusestevez