pdfminer.six icon indicating copy to clipboard operation
pdfminer.six copied to clipboard

Getting KeyError: 'N' in extract_pages when pdf file contains Note.

Open suruchi-psi opened this issue 10 months ago • 2 comments

  • PDFMiner not working when pdf contains notes within it. Please find the logs below:
  File "/workspace/treeClassifier.py", line 56, in extract_features
    for page_number, page_layout in enumerate(extract_pages(pdf_file_obj)):
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
    interpreter.process_page(page)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
    self.init_resources(resources)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 387, in init_resources
    colorspace = get_colorspace(resolve1(spec))
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 370, in get_colorspace
    return PDFColorSpace(name, stream_value(spec[1])["N"])
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pdfminer/pdftypes.py", line 285, in __getitem__
    return self.attrs[name]
KeyError: 'N' 

suruchi-psi avatar Feb 07 '25 07:02 suruchi-psi

Can you share the code / command and the PDF you are using?

pietermarsman avatar Apr 03 '25 17:04 pietermarsman

This isn't related to the notes, it's an invalid or corrupted PDF. An ICCBased colour space is required to have the form [/ICCBased stream], and the stream must contain an N entry in its stream dictionary.

Given that colour spaces are not really used by pdfminer anyway we could probably just catch the exception in this case.

dhdaines avatar Apr 09 '25 18:04 dhdaines