pdfplumber icon indicating copy to clipboard operation
pdfplumber copied to clipboard

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

Open loganathanspr opened this issue 1 year ago • 1 comments

Describe the bug

I am extracting annotations from a pdf file. It is giving me the TypeError when accessing the .annots. When I updated each annotations manually (just adding/deleting one extra character ), it didn't give me this error. I am suspecting the original text encoding of the annotation is different than the one expected by the pdfplumber. Does pdfplumber have any strict assumption on the text encoding?

Code to reproduce the problem

def get_pdf_annotations(pdf_path: str):
  """Get all annotations (by page) for a pdf file.

  Args:
    pdf_path (str): Path to pdf file.

  Returns:
    List of annotations: List index corresponds to page numbers (starting from 0)
    and each list item is a list of annotations found for that page.
  """
  annots_all_pages = []
  with pdfplumber.open(pdf_path) as pdf:
    pages = pdf.pages
    for p in pages:
      page_annots = []
      texts = []
      colors = []      
      annotations = p.annots
     # ...
     # ....
  return annots_all_pages

Screenshots

Screenshot 2022-09-07 at 14 16 50

Environment

  • pdfplumber version: 0.7.4
  • pdfminer.six version: 20220524
  • Python version: 3.8, 3.9
  • OS: Mac, Linux

loganathanspr avatar Sep 07 '22 12:09 loganathanspr

Thanks for flagging @loganathanspr! Looking at the stacktrace, my best guess is that the annotation has an undefined bounding box. (Hence why it'd get such an error on line 167, where the stacktrace is pointing.) But it's a bit difficult to know for sure, or to test a fix, without seeing the actual PDF. Are you able to share that?

jsvine avatar Sep 07 '22 12:09 jsvine