pypdf
pypdf copied to clipboard
Bug in add_attachment() when attaching several files
While reading the source code of add_attachment(), I found out that it doesn't comply with the PDF specifications when it is called multiple times to attach multiple files.
In PDF reference 1.7 section 3..8.5 "Name Trees", it says: << The Names entries in the leaf (or root) nodes contain the tree’s keys and their associated values, arranged in key-value pairs and sorted lexically in ascending order by key.>>
In the current implementation, the order of the keys correspond to the order of the calls to add_attachment(). You can test with the following code:
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
writer.add_blank_page(100, 100)
writer.add_attachment("zz.txt", b"ZZ file content")
writer.add_attachment("aa.txt", b"AA file content")
with open("two_attachments.pdf", 'wb') as f:
writer.write(f)
f.close()
When you look at the generated PDF file, in /Names/EmbeddedFiles/Names, you will have:
- zz.txt
- aa.txt
The PDF specs says that they should be sorted in alphabetical order, so you should have:
- aa.txt
- zz.txt
From my experience, many PDF readers don't care about that (evince, PDF Studio viewer), but Acrobat Reader DC will be impacted by this: Acrobat Reader will display the attachments but, when the user tries to save the attachment to disk or open it, it won't work (without any error message).
I discovered this in 2018 when someone reported a bug on my factur-x lib when using the possibility to add additional attachments and opening the resulting file in Acrobat Reader DC. And I remember going crazy when working on this bug because I was able to reproduce the bug with some filenames... and the bug would disappear just by changing the filename !!! Eventually, the guy who reported the bug found this small detail in the PDF reference about sorting by alphabetical order and, with that information, I was able to fix it. The fix just involved calling sorted() to order the filenames by alphabetical order: https://github.com/akretion/factur-x/commit/a3ebfa416523983fd3b54b9569181bc729dae4b1
The possibility to add multiple attachments was added in this PR https://github.com/py-pdf/pypdf/pull/1611 by @pubpub-zz