pdfrw PdfReader reader fails in decryption

PdfReader reader fails in decryption

Open arshad01 opened this issue 5 years ago • 1 comments

Hello

I am using pdfrw to read an encrypted file. The file does not need a password to open it and I can view it in Adobe Reader. When opening with PdfReader I am getting an exception.

$ python
Python 2.7.10 (default, Jan 30 2019, 03:22:04) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdfrw
>>> pdfrw.PdfReader('Encrypted.pdf', decrypt=True, decompress=True)
[WARNING] tokens.py:221 Did not find PDF object (197, 0) (line=2076, col=1, token='startxref')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/prometheus/pdfrw/lib/python2.7/site-packages/pdfrw/pdfreader.py", line 645, in __init__
    self._parse_encrypt_info(source, password, trailer)
  File "/home/prometheus/pdfrw/lib/python2.7/site-packages/pdfrw/pdfreader.py", line 499, in _parse_encrypt_info
    key = crypt.create_key(password, trailer)
  File "/home/prometheus/pdfrw/lib/python2.7/site-packages/pdfrw/crypt.py", line 31, in create_key
    key_size = int(doc.Encrypt.Length or 40) // 8
AttributeError: 'NoneType' object has no attribute 'Length'

It seems like that the issue is being cause by not being able to find the object (197, 0) even though it is present in the pdf file. Object (197, 0) contains the details of the encryption.

Any help in solving this issue is greatly appreciated. Thanks

(Edit: Sample pdf can be downloaded from https://www.proofpoint.com/us/resources/white-papers/who-moved-my-data)

Sep 20 '19 04:09 arshad01

I have done a fix for this issue. Please check if it is correct. Thanks.

Note: I could not run the unit tests successfully even without this change.

$ git diff
diff --git a/pdfrw/pdfreader.py b/pdfrw/pdfreader.py
index c2ae030..621fff4 100644
--- a/pdfrw/pdfreader.py
+++ b/pdfrw/pdfreader.py
@@ -614,8 +614,8 @@ class PdfReader(PdfDict):
             # Find all the xref tables/streams, and
             # then deal with them backwards.
             xref_list = []
+            source.obj_offsets = {}
             while 1:
-                source.obj_offsets = {}
                 trailer, is_stream = self.parsexref(source)
                 prev = trailer.Prev
                 if prev is None:

Sep 23 '19 03:09 arshad01

pdfrw pdfrw copied to clipboard

PdfReader reader fails in decryption

pdfrw
pdfrw copied to clipboard