pdfminer.six icon indicating copy to clipboard operation
pdfminer.six copied to clipboard

KeyError: 'JBIG2Globals'

Open paucazou opened this issue 2 years ago • 2 comments

  • A description of the bug Trying to extract images from a one page pdf, I found a key Error. The file is readable by pdf viewer like Okular or Evince

  • Steps to reproduce the bug. The command I tried is the following (it's the first time I try pdfminer.six): pdf2txt 64.pdf -n --output-dir cats

64.pdf

  • If relevant, include the output and/or error stacktrace.
Traceback (most recent call last):
  File "/home/philippe/.local/bin/pdf2txt.py", line 313, in <module>
    sys.exit(main())
  File "/home/philippe/.local/bin/pdf2txt.py", line 307, in main
    outfp = extract_text(**vars(parsed_args))
  File "/home/philippe/.local/bin/pdf2txt.py", line 62, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/high_level.py", line 121, in extract_text_to_fp
    interpreter.process_page(page)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 992, in process_page
    self.device.end_page(page)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 80, in end_page
    self.receive_layout(self.cur_item)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 322, in receive_layout
    render(ltpage)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 311, in render
    render(child)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 311, in render
    render(child)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 318, in render
    self.imagewriter.export_image(item)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/image.py", line 131, in export_image
    global_streams = self.jbig2_global(image)
  File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/image.py", line 191, in jbig2_global
    global_streams.append(params["JBIG2Globals"].resolve())
KeyError: 'JBIG2Globals'

Thanks in advance !

paucazou avatar Apr 01 '22 16:04 paucazou

I can replicate this with:

PYTHONPATH=. tools/pdf2txt.py 64.pdf --output-dir images

Needs a fix.

pietermarsman avatar Apr 04 '22 20:04 pietermarsman

I run into the same issue while parsing some PDF files. As one can see in the above stacktrace the error happen in line 191 (186 in my version) when code encounters line "global_streams.append(params["JBIG2Globals"].resolve())" while "params" does not have a field named "JBIG2Globals". In fact "JBIG2Globals" does not show up anywhere in the rest of the code.

A workaround I found was to add "if "JBIG2Globals" in params: " to the offending line. It is a patch not a fix, but the code does not crash.

jarek-tuszynski avatar Jun 24 '22 12:06 jarek-tuszynski