py-pdf-parser icon indicating copy to clipboard operation
py-pdf-parser copied to clipboard

AttributeError: 'NoneType' object has no attribute 'encode' with load_file

Open umaplehurst opened this issue 1 year ago • 1 comments

Bug Report

Since v0.12.0 I seem to get this sort of backtrace when loading certain .pdf files:

  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 41, in load_file
    return load(in_file, pdf_file_path=path_to_file, la_params=la_params, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 75, in load
    for page in extract_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\high_level.py", line 197, in extract_pages
    for page in PDFPage.get_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfpage.py", line 151, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 744, in __init__
    self._initialize_password(password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 771, in _initialize_password
    handler = factory(docid, param, password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 358, in __init__
    self.init()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 366, in init
    self.init_key()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 379, in init_key
    self.key = self.authenticate(self.password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 428, in authenticate
    password_bytes = password.encode("latin1")
AttributeError: 'NoneType' object has no attribute 'encode'

Not sure why it only happens with certain files -- has to hit if "Encrypt" in trailer: in pdfdocument.py of pdfminer.six which only happens with certain files? -- but < v0.12.0 is fine. The problem seems to be with: password: str = None that was added in py_pdf_parser/loaders.py for load(...) as part of 02f92cef2905a9d05783f4cfbf90598e2e60236a. I guess this needs to be changed to password: str = "" to match what pdfminer.six has as its default (see pdfpage.py, get_pages) and then everything should be fine again.

umaplehurst avatar Sep 04 '24 00:09 umaplehurst