pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

TypeError: can only concatenate list (not "str") to list

Open SaeedEY opened this issue 3 years ago • 4 comments

I have just tried to read a password protected pdf with the password 'D)445416D(}+587207(EIz|5994276' and rewrite it to undecrypted file then so this bug happened !

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-***************

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.1.0

Code

from PyPDF2 import PdfFileReader, PdfFileWriter

passw = 'D)445416D(}+587207(EIz|5994276'

def decrypt_pdf(input_path, output_path, password):
  with open(input_path, 'rb') as input_file, \
    open(output_path, 'wb') as output_file:
    reader = PdfFileReader(input_file)
    reader.decrypt(password)

    writer = PdfFileWriter()

    for i in range(reader.getNumPages()):
      writer.addPage(reader.getPage(i))

    writer.write(output_file)

if __name__ == '__main__':
  # example usage:
  decrypt_pdf('input_ecrypted.pdf', 'output_decrypted.pdf', passw)
Traceback (most recent call last):
  File "........\\test.py", line 34, in <module>
    decrypt_pdf('input_ecrypted.pdf', 'output_decrypted.pdf', passw)
  File "........\\test.py", line 23, in decrypt_pdf
    reader.decrypt(list(password))
  File "..............\Python310\lib\site-packages\PyPDF2\_reader.py", line 1617, in decrypt
    return self._decrypt(password)
  File "..............\Python310\lib\site-packages\PyPDF2\_reader.py", line 1661, in _decrypt
    user_password, key = self._authenticate_user_password(password)
  File "..............\Python310\lib\site-packages\PyPDF2\_reader.py", line 1714, in _authenticate_user_password
    U, key = _alg35(
  File "..............\Python310\lib\site-packages\PyPDF2\_security.py", line 194, in _alg35
    key = _alg32(password, rev, keylen, owner_entry, p_entry, id1_entry)
  File "..............\Python310\lib\site-packages\PyPDF2\_security.py", line 65, in _alg32
    password_bytes = b_((str_(password) + str_(_encryption_padding))[:32])
TypeError: can only concatenate list (not "str") to list

PDF

I couldn't share the pdf but here is the 5 line at the tail

<</Root 2051 0 R/ID [<0cf81d43b578d44309d457f38834559d><a5c2eb78d7b35c3781681ab80b4b0235>]/Encrypt 2057 0 R/Info 1 0 R/Size 2058>>
%b05a3-dc78d-4ea54-0ea50-5.4.3
startxref
1506230
%%EOF

SaeedEY avatar Jun 12 '22 08:06 SaeedEY

Would you be so kind and look for /CFM?

I suspect to see something like this:

<< /CF << /StdCF << /AuthEvent /DocOpen /CFM /AESV3 /Length 32 >> >> /Filter /Standard /Length 256 /O <f8a1cd0f4a989b4114f4d1831d40eeede44885f7cd574b7d74b05ea74276253ad5d833ba2d2ddac3129cad731efcef60> /OE <583bce1f6132a4e73586f084652ff8214c099b779afde715aaf3e9daeb7f26c5> /P -4 /Perms <642a0991e390678578789948054269b6> /R 6 /StmF /StdCF /StrF /StdCF /U <6af059d739d8d9e68cef3e9439e45c02768a04d4f09707d91d83720c39271ccf6e91941dbd90f8af40c592db7800462f> /UE <8e2b29f2c6c540f25244286d2415e9aeaf2fb8eb16cb0ae9e8b66ca2f8bfc48f> /V 5 >>

There are a couple of encryption algorithms which we currently don't support. The PR #749 is almost finished and will be merged likely this month. That will add more encryption/decryption support, but the latest ones will still be missing.

Alternatively, at least for the moment, you can remove the password with a tool that might support this encryption type: https://askubuntu.com/q/828720/10425 (QPDF would be my best guess, but I'm uncertain if they support the latest algorithms)

MartinThoma avatar Jun 12 '22 09:06 MartinThoma

Hi @MartinThoma , Actually I could not find any of the given keywords such as "CF","StdCF","AuthEvent","DocOpen","AESV3" or .... in this PDF file but I know below few details about encryption which may help you improve PyPDF2 , for instance :

  • CipherMode : CBC
  • PaddingMode : PKCS7
  • BlockSize : 128
  • IV : 16 byte
  • KeySize : 256 Also feel free to ask me any further question as required to find out the problem.

SaeedEY avatar Jun 12 '22 09:06 SaeedEY

PR #749 can NOT deal with that, this encryption algorithm is defined by PDF 2.0 specification, but i can't find the specification document, so i left it unimplemented in PR #749. maybe i could figure it out through the source code of other pdf tools later.

exiledkingcc avatar Jun 12 '22 15:06 exiledkingcc

Ah, damn. I think I'll set up a Github organization funding page. If people / companies start supporting PyPDF2 financially, we could simply buy the PyPDF2 standard :-/

Let's see.

MartinThoma avatar Jun 12 '22 15:06 MartinThoma

@MartinThoma this can be closed, it was fixed by #1015

exiledkingcc avatar May 03 '23 09:05 exiledkingcc