pdfminer.six
pdfminer.six copied to clipboard
KeyError: 'JBIG2Globals'
-
A description of the bug Trying to extract images from a one page pdf, I found a key Error. The file is readable by pdf viewer like Okular or Evince
-
Steps to reproduce the bug. The command I tried is the following (it's the first time I try pdfminer.six):
pdf2txt 64.pdf -n --output-dir cats
- If relevant, include the output and/or error stacktrace.
Traceback (most recent call last):
File "/home/philippe/.local/bin/pdf2txt.py", line 313, in <module>
sys.exit(main())
File "/home/philippe/.local/bin/pdf2txt.py", line 307, in main
outfp = extract_text(**vars(parsed_args))
File "/home/philippe/.local/bin/pdf2txt.py", line 62, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/high_level.py", line 121, in extract_text_to_fp
interpreter.process_page(page)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 992, in process_page
self.device.end_page(page)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 80, in end_page
self.receive_layout(self.cur_item)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 322, in receive_layout
render(ltpage)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 311, in render
render(child)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 311, in render
render(child)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/converter.py", line 318, in render
self.imagewriter.export_image(item)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/image.py", line 131, in export_image
global_streams = self.jbig2_global(image)
File "/home/philippe/.local/lib/python3.9/site-packages/pdfminer/image.py", line 191, in jbig2_global
global_streams.append(params["JBIG2Globals"].resolve())
KeyError: 'JBIG2Globals'
Thanks in advance !
I can replicate this with:
PYTHONPATH=. tools/pdf2txt.py 64.pdf --output-dir images
Needs a fix.
I run into the same issue while parsing some PDF files. As one can see in the above stacktrace the error happen in line 191 (186 in my version) when code encounters line "global_streams.append(params["JBIG2Globals"].resolve())" while "params" does not have a field named "JBIG2Globals". In fact "JBIG2Globals" does not show up anywhere in the rest of the code.
A workaround I found was to add "if "JBIG2Globals" in params: " to the offending line. It is a patch not a fix, but the code does not crash.