Parsr
Parsr copied to clipboard
Java issue showing up during processing
Summary Processing a PDF I see an error in the logs
Steps To Reproduce just upload an PDF which is an image and set the OCR on
Expected behavior A clear and concise description of what you expected to happen.
Environment Provided docker container
executing command error: Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
check=True,
File "/usr/lib/python3.7/subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
main()
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
it requires Java to be pre-installed on the system with path of the java bin to set on your $Path environment variable
can someone help me understand why there is a java dependency and java is not installed in the parsr image? Thanks!
I am with @csmizzle . Shouldn't the docker image include Java if it's necessary?
I also experience this issue when running the docker container:
parsr-parsr-1 | [2023-05-06T18:58:09] INFO (parsr-api/8 on 3ba7089dce28): executing command error: Traceback (most recent call last):
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
parsr-parsr-1 | check=True,
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 472, in run
parsr-parsr-1 | with Popen(*popenargs, **kwargs) as process:
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 775, in init
parsr-parsr-1 | restore_signals, start_new_session)
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
parsr-parsr-1 | raise child_exception_type(errno_num, err_msg, err_filename)
parsr-parsr-1 | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
parsr-parsr-1 |
parsr-parsr-1 | During handling of the above exception, another exception occurred:
parsr-parsr-1 |
parsr-parsr-1 | Traceback (most recent call last):
parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in java
command is not found from this Python process.Please ensure Java is installed and PATH is set for java
parsr-parsr-1 |
parsr-parsr-1 | [2023-05-06T18:58:10] INFO (parsr-api/8 on 3ba7089dce28): 0 tables found on document.
I'll work on this tonight/tmrw AM and open a PR.
working on this everyone, having troubles with the client as well.
Any info on this issue? Thanks!