pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Bug in add_attachment() when attaching several files

Open alexis-via opened this issue 1 year ago • 0 comments

While reading the source code of add_attachment(), I found out that it doesn't comply with the PDF specifications when it is called multiple times to attach multiple files.

In PDF reference 1.7 section 3..8.5 "Name Trees", it says: << The Names entries in the leaf (or root) nodes contain the tree’s keys and their associated values, arranged in key-value pairs and sorted lexically in ascending order by key.>>

In the current implementation, the order of the keys correspond to the order of the calls to add_attachment(). You can test with the following code:

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
writer.add_blank_page(100, 100)
writer.add_attachment("zz.txt", b"ZZ file content")
writer.add_attachment("aa.txt", b"AA file content")

with open("two_attachments.pdf", 'wb') as f:
    writer.write(f)
    f.close()

When you look at the generated PDF file, in /Names/EmbeddedFiles/Names, you will have:

  1. zz.txt
  2. aa.txt

The PDF specs says that they should be sorted in alphabetical order, so you should have:

  1. aa.txt
  2. zz.txt

From my experience, many PDF readers don't care about that (evince, PDF Studio viewer), but Acrobat Reader DC will be impacted by this: Acrobat Reader will display the attachments but, when the user tries to save the attachment to disk or open it, it won't work (without any error message).

I discovered this in 2018 when someone reported a bug on my factur-x lib when using the possibility to add additional attachments and opening the resulting file in Acrobat Reader DC. And I remember going crazy when working on this bug because I was able to reproduce the bug with some filenames... and the bug would disappear just by changing the filename !!! Eventually, the guy who reported the bug found this small detail in the PDF reference about sorting by alphabetical order and, with that information, I was able to fix it. The fix just involved calling sorted() to order the filenames by alphabetical order: https://github.com/akretion/factur-x/commit/a3ebfa416523983fd3b54b9569181bc729dae4b1

The possibility to add multiple attachments was added in this PR https://github.com/py-pdf/pypdf/pull/1611 by @pubpub-zz

alexis-via avatar Aug 15 '23 21:08 alexis-via