pdftotree icon indicating copy to clipboard operation
pdftotree copied to clipboard

Getting error `SEVERE: Cannot read JBIG2 image: jbig2-imageio is not installed`

Open pgarz opened this issue 4 years ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

I'm getting the following stack trace error when running pdftotree on a PDF that contains scientific chemical information:

SEVERE: Cannot read JBIG2 image: jbig2-imageio is not installed
[DEBUG] pdftotree.TreeExtract - Tabula recognized 0 table(s).
Traceback (most recent call last):
  File "/opt/anaconda3/envs/noble_app_env/bin/pdftotree", line 94, in <module>
    args.visualize,
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/core.py", line 66, in parse
    pdf_html = extractor.get_html_tree()
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/TreeExtract.py", line 319, in get_html_tree
    page.appendChild(table_element)
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/xml/dom/minidom.py", line 114, in appendChild
    if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'

I've installed the latest Java version for Mac OS X. pdftotree seems to work just fine on simple PDFs. I've also haven't been able to figure out how to even attempt trying to install jbig2-imageio manually. I'm not familiar with how to install that JAR file into the pdftotree installation

To Reproduce Steps to reproduce the behavior:

  1. Install the Java JDK for Mac OSK
  2. Install ImageMagick with brew
  3. Attempt to run hOCR extraction with pdftotree on a file with chemical molecule images

Expected behavior A clear and concise description of what you expected to happen.

For the proper hOCR output to be generated and for the command to execute successfully

Error Logs/Screenshots If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Mac OS X 10.15
  • pdftotree Version: [e.g. v0.5.0]
  • pdfminer.six Version: [e.g. 20201018]

Additional context Add any other context about the problem here.

pgarz avatar May 27 '21 00:05 pgarz

Same here, any updates on this?

redbrain avatar Nov 30 '22 23:11 redbrain