Parsr icon indicating copy to clipboard operation
Parsr copied to clipboard

Java issue showing up during processing

Open idealley opened this issue 2 years ago • 7 comments

Summary Processing a PDF I see an error in the logs

Steps To Reproduce just upload an PDF which is an image and set the OCR on

Expected behavior A clear and concise description of what you expected to happen.

Environment Provided docker container

 executing command error: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
    check=True,
  File "/usr/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
    main()
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
    tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
    raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`

idealley avatar Jan 25 '22 22:01 idealley

it requires Java to be pre-installed on the system with path of the java bin to set on your $Path environment variable

ajaykumarbharaj avatar Feb 04 '22 06:02 ajaykumarbharaj

can someone help me understand why there is a java dependency and java is not installed in the parsr image? Thanks!

csmizzle avatar Oct 10 '22 17:10 csmizzle

I am with @csmizzle . Shouldn't the docker image include Java if it's necessary?

turian avatar May 05 '23 21:05 turian

I also experience this issue when running the docker container:

parsr-parsr-1 | [2023-05-06T18:58:09] INFO (parsr-api/8 on 3ba7089dce28): executing command error: Traceback (most recent call last): parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run parsr-parsr-1 | check=True, parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 472, in run parsr-parsr-1 | with Popen(*popenargs, **kwargs) as process: parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 775, in init parsr-parsr-1 | restore_signals, start_new_session) parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child parsr-parsr-1 | raise child_exception_type(errno_num, err_msg, err_filename) parsr-parsr-1 | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java' parsr-parsr-1 | parsr-parsr-1 | During handling of the above exception, another exception occurred: parsr-parsr-1 | parsr-parsr-1 | Traceback (most recent call last): parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in parsr-parsr-1 | main() parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main parsr-parsr-1 | tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json") parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf parsr-parsr-1 | output = _run(java_options, kwargs, path, encoding) parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run parsr-parsr-1 | raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR) parsr-parsr-1 | tabula.errors.JavaNotFoundError: java command is not found from this Python process.Please ensure Java is installed and PATH is set for java parsr-parsr-1 | parsr-parsr-1 | [2023-05-06T18:58:10] INFO (parsr-api/8 on 3ba7089dce28): 0 tables found on document.

michaelwechner avatar May 06 '23 19:05 michaelwechner

I'll work on this tonight/tmrw AM and open a PR.

csmizzle avatar May 06 '23 19:05 csmizzle

working on this everyone, having troubles with the client as well.

csmizzle avatar May 31 '23 01:05 csmizzle

Any info on this issue? Thanks!

stefanknegt avatar Feb 06 '24 15:02 stefanknegt