Parsr icon indicating copy to clipboard operation
Parsr copied to clipboard

Error in TableDetection2 script: FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

Open jfilter opened this issue 5 years ago • 10 comments

To reproduce: Run the V1.1.0 Docker image and try to extract tables with TableDetection2 enabled.

parsr_1       |   File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
parsr_1       |     check=True,
parsr_1       |   File "/usr/lib/python3.7/subprocess.py", line 472, in run
parsr_1       |     with Popen(*popenargs, **kwargs) as process:
parsr_1       |   File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
parsr_1       |     restore_signals, start_new_session)
parsr_1       |   File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
parsr_1       |     raise child_exception_type(errno_num, err_msg, err_filename)
parsr_1       | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
parsr_1       |
parsr_1       | During handling of the above exception, another exception occurred:
parsr_1       |
parsr_1       | Traceback (most recent call last):
parsr_1       |   File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
parsr_1       |     main()
parsr_1       |   File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
parsr_1       |     tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
parsr_1       |   File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
parsr_1       |     output = _run(java_options, kwargs, path, encoding)
parsr_1       |   File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
parsr_1       |     raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
parsr_1       | tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
parsr_1       |

jfilter avatar Aug 09 '20 16:08 jfilter

Hello @jfilter,

Could you provide us the file you tried to parse?

dafelix42 avatar Aug 18 '20 16:08 dafelix42

It happens for me with all files that don't contain a table (and thus table1 does not find a table). Here en example. (It's a letter, but it's public information). My config:

{
  "version": 0.9,
  "extractor": {
    "pdf": "pdfminer",
    "ocr": "tesseract",
    "language": ["deu"]
  },
  "cleaner": [
    "drawing-detection",
    [
      "image-detection",
      {
        "ocrImages": false
      }
    ],
    "out-of-page-removal",
    [
      "whitespace-removal",
      {
        "minWidth": 0
      }
    ],
    [
      "redundancy-detection",
      {
        "minOverlap": 0.5
      }
    ],
    [
      "table-detection",
      {
        "runConfig": [
          {
            "pages": [],
            "flavor": "lattice"
          }
        ]
      }
    ],
    [
      "table-detection-2",
      {
        "runConfig": [
          {
            "pages": []
          }
        ]
      }
    ],
    [
      "header-footer-detection",
      {
        "ignorePages": [],
        "maxMarginPercentage": 15
      }
    ],
    "words-to-line-new",
    [
      "reading-order-detection",
      {
        "minVerticalGapWidth": 5,
        "minColumnWidthInPagePercent": 15
      }
    ],
    [
      "lines-to-paragraph",
      {
        "tolerance": 0.25
      }
    ],
    "page-number-detection",
    "hierarchy-detection"
  ],
  "output": {
    "granularity": "word",
    "includeMarginals": true,
    "includeDrawings": true,
    "formats": {
      "json": true,
      "text": false,
      "csv": true,
      "markdown": false,
      "pdf": false,
      "simpleJson": false
    }
  }
}

00014_012720_Stellungnahme_BV-Augen%C3%A4rzte_RefE__JVEG-%C3%84ndG.pdf

jfilter avatar Aug 18 '20 16:08 jfilter

Ok, thanks. Which OS are you using?

dafelix42 avatar Aug 19 '20 09:08 dafelix42

I run it with the Docker image on Ubuntu and macOS.

jfilter avatar Aug 19 '20 09:08 jfilter

I see exactly the same. Looks like there is no JDK in Parsr base docker

NadiaRom avatar Feb 06 '21 23:02 NadiaRom

+1 to what @NadiaRom said, Java is not installed

MattAlp avatar Feb 27 '21 03:02 MattAlp

Same here with the newest docker container when going through the official "Jupyter Notebook Demo" tutorial:

[2021-05-21T10:59:31] INFO (parsr-api/7 on d11c50a15136): executing command: python3 /opt/app-root/src/dist/assets/TableDetection2Script.py /tmp/cc7fc6b8253399c96cbef5f0a7107a.pdf all
[2021-05-21T10:59:33] INFO (parsr-api/7 on d11c50a15136): executing command error: Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
check=True,
File "/usr/lib/python3.7/subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
main()
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`

AndBu avatar May 21 '21 11:05 AndBu

Same here in Windows using latest docker image from docker hub. Exactly the same place in the processes.

jurijsk avatar Oct 13 '21 08:10 jurijsk

What is the best workaround for this? Should we modify the docker image to add a layer for installing/setting the path to Java?

jbrry avatar Feb 14 '23 17:02 jbrry

@jbrry : If you comment out the 'table detection 2' part from the serverConfig.,json file you wont see this error..

sivagabbi-DY avatar Mar 10 '23 04:03 sivagabbi-DY