pypdf
pypdf copied to clipboard
PyCryptoDome padding issue, AES encryption CBC mode
This is not a fully qualified bug report, because I lack a reproducible example for a number of reasons. However, I originally posted in issue 416 about how I was experiencing decryption issue with a file generated by the Acrobat Sign product. It is a v1.7 PDF, with AES 128-bit encryption. That issue notes a merged fix (#1015).
I downloaded PyPDF2 2.10.0, which then gave a new error about missing PyCryptoDome, which I then installed (v3.15.0). Running my test case again, I received the following error:
ValueError: Data must be padded to 16 byte boundary in CBC mode
I know little about PDF specs, and even less about encryption, however using the PyCryptoDome docs I did find that the following code addition to _encryption.py
alleviated this issue for my file:
86a87,89
> if len(data) % 16:
> from Crypto.Util.Padding import pad
> data = pad(data, 16)
Again, lacking a reproducible test case I'm not sure how useful this is but wanted to share my findings in case someone with access to Acrobat Sign can generate a file that demonstrates the same behavior.
Thank you for sharing :heart:
I did find that the following code addition to _encryption.py alleviated this issue for my file:
So the mentioned lines fixed your problem? You could read the decrypted file properly?
The code you use is something like this, I guess:
from PyPDF2 import PdfReader
reader = PdfReader("private-and-encrypted.pdf", password="example")
print(reader.extract_text())
So the mentioned lines fixed your problem? You could read the decrypted file properly?
That is correct. PdfFileMerger()
can now successfully decrypt the file and merge it with others, which is my only need. (By which I mean, I have no want or need for encryption - it was a byproduct of a user-submitted file - and so I only care that PyPDF2 can process it and output a viewable PDF.)
The code you use is something like this, I guess:
from PyPDF2 import PdfReader reader = PdfReader("private-and-encrypted.pdf", password="example") print(reader.extract_text())
Here is my full test code, FWIW. Again, I'm sorry I can't provide the file - it contains PII and I don't have access to Acrobat Sign to try to generate an example. The file doesn't require a password and can be opened in desktop reader software. It was only PyPDF2 that was having a problem with it.
import io
import PyPDF2
file_list = ['test_pdfs/encrypted_pdf.pdf']
merger = PyPDF2.PdfFileMerger()
for f in file_list:
with open (f, 'rb') as p:
merger.append(fileobj=p)
output = io.BytesIO()
merger.write(output)
output.seek(0)
with open('test_output.pdf', 'wb') as o:
o.write(output.read())