pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

PyCryptoDome padding issue, AES encryption CBC mode

Open bchandos opened this issue 1 year ago • 5 comments

This is not a fully qualified bug report, because I lack a reproducible example for a number of reasons. However, I originally posted in issue 416 about how I was experiencing decryption issue with a file generated by the Acrobat Sign product. It is a v1.7 PDF, with AES 128-bit encryption. That issue notes a merged fix (#1015).

I downloaded PyPDF2 2.10.0, which then gave a new error about missing PyCryptoDome, which I then installed (v3.15.0). Running my test case again, I received the following error:

ValueError: Data must be padded to 16 byte boundary in CBC mode

I know little about PDF specs, and even less about encryption, however using the PyCryptoDome docs I did find that the following code addition to _encryption.py alleviated this issue for my file:

86a87,89
>             if len(data) % 16:
>                 from Crypto.Util.Padding import pad
>                 data = pad(data, 16)

Again, lacking a reproducible test case I'm not sure how useful this is but wanted to share my findings in case someone with access to Acrobat Sign can generate a file that demonstrates the same behavior.

bchandos avatar Aug 10 '22 18:08 bchandos

Thank you for sharing :heart:

MartinThoma avatar Aug 10 '22 18:08 MartinThoma

I did find that the following code addition to _encryption.py alleviated this issue for my file:

So the mentioned lines fixed your problem? You could read the decrypted file properly?

MartinThoma avatar Aug 10 '22 18:08 MartinThoma

The code you use is something like this, I guess:

from PyPDF2 import PdfReader

reader = PdfReader("private-and-encrypted.pdf", password="example")
print(reader.extract_text()) 

MartinThoma avatar Aug 10 '22 18:08 MartinThoma

So the mentioned lines fixed your problem? You could read the decrypted file properly?

That is correct. PdfFileMerger() can now successfully decrypt the file and merge it with others, which is my only need. (By which I mean, I have no want or need for encryption - it was a byproduct of a user-submitted file - and so I only care that PyPDF2 can process it and output a viewable PDF.)

bchandos avatar Aug 10 '22 18:08 bchandos

The code you use is something like this, I guess:

from PyPDF2 import PdfReader

reader = PdfReader("private-and-encrypted.pdf", password="example")
print(reader.extract_text()) 

Here is my full test code, FWIW. Again, I'm sorry I can't provide the file - it contains PII and I don't have access to Acrobat Sign to try to generate an example. The file doesn't require a password and can be opened in desktop reader software. It was only PyPDF2 that was having a problem with it.

import io
import PyPDF2

file_list = ['test_pdfs/encrypted_pdf.pdf']

merger = PyPDF2.PdfFileMerger()
for f in file_list:
    with open (f, 'rb') as p:
        merger.append(fileobj=p)
output = io.BytesIO()
merger.write(output)
output.seek(0)
with open('test_output.pdf', 'wb') as o:
    o.write(output.read())

bchandos avatar Aug 10 '22 18:08 bchandos