pypdf
pypdf copied to clipboard
HTML links to document page broken after merge
If you have links in PDF file (html anchor tag with element id as href) they would not work after merging.
<a href="#target">Go to target</a>
....some content
<div id="target>Target here</div>
report = PdfFileReader(BytesIO(pdf)) # rendered html file to pdf with html links
merger = PdfFileMerger(strict=False)
merger.append(report)
result = BytesIO()
merger.write(result)
result.seek(0)
return result.read()
I second this issue. It occurs when merging even via the pageObject.mergePage method
I'll jump on the train - having the same issue here!
Can anybody share a PDF which shows this issue? Is it still an issue with the latest PyPDF2 version?
@MartinThoma Not sure if you're still looking for an example, but you can find one below:
I have generated book.pdf
using sample doc creation of jupyter-book project. You can see that internal HTML links in Contents section of book_out.pdf
don't work, which work fine in book.pdf
. The conversion from book.pdf
to book_out.pdf
uses the below code snippet:
from PyPDF2 import PdfReader, PdfWriter
PDF = "./doc/_build/pdf/book.pdf"
OUT_PDF = "./doc/_build/pdf/book_out.pdf"
reader = PdfReader(PDF)
writer = PdfWriter()
for page in reader.pages:
writer.addPage(page)
with open(OUT_PDF, "wb") as f:
writer.write(f)
Thank you @SimplyOm :hugs:
Hi @MartinThoma , is this solved? facing same issue
In progress, should come back soon
I think the issue is found :
a) The links are using named dest, not copied with the add_page : I've coded the append/merge functions into PdfWriter
b) some types were not matching : I've added a function for that (implemented in the merge
changes in pr #1371 (still in progress)
On a relevant issue, when using merge, the internal links of a pdf seem to be broken. I refer to links, for example, to a reference at the end of the pdf in a research paper or to a section of the paper. Any ideas on how to keep those links active when mergin?
@manathan1984, it is now recommended to use PdfWriter and append() that should fix the issues. Can you try it and update the status of this issue?
@pubpub-zz
writer = PdfWriter()
for pdf in ["cover_page.pdf", "main_report.pdf", "back_cover.pdf"]:
writer.append(pdf)
with open("result.pdf", "wb") as f:
writer.write(f)
getting below error when using PdfWriter and append() .
AttributeError: 'NumberObject' object has no attribute 'indirect_reference'
@DX9807 Can you please provide the pdf
@pubpub-zz Check the files given below back_cover.pdf central.pdf cover_page.pdf
While trying to merge the above pdfs using PdfWriter and its append method I am getting this error.
AttributeError: 'NumberObject' object has no attribute 'indirect_reference'
But when I use PdfMerger class and the corresponding append method the pdfs get merged but the internal hyperlinks are not working in this case,
Hello, it is included in 3.16.0?
If you have a look at the last commit referenced here (https://github.com/py-pdf/pypdf/commit/b1fa953bf7585e799c1541fd4bd5b0b8daa247bc), you will see that this fix is included since version 3.11.1.