[Bug]: Unable to run java; is it installed?
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch name
main
Commit ID
791afbb
Other environment information
Ubuntu 22.04.5 LTS - docker.
Actual behavior
When trying to import documents I get errors about Java not being installed.
raise RuntimeError("Unable to start Tika server.")
RuntimeError: Unable to start Tika server.
2024-10-18 03:39:32,511 [MainThread ] [ERROR] Unable to run java; is it installed?
[ERROR] [2024-10-18 03:39:32,511] [tika.startServer] [line:669]: Unable to run java; is it installed?
2024-10-18 03:39:32,512 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.
[ERROR] [2024-10-18 03:39:32,512] [tika.checkTikaServer] [line:601]: Failed to receive startup confirmation from startServer.
Expected behavior
The docx files get imported.
Steps to reproduce
Started server with default config, changed the ports to 81 and 444. Tried to import docx files and all said fail, unable to start tika server.
Additional information
No response
Do you mean in docker images? Yes, it is.
Do you mean in docker images? Yes, it is.
The title of the bug is the error message in the container log…
The problem seems to be downloading Tika when I first import documents. It logs that it is requesting from http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar
I am based in China so I think the request is just timing out. Is there a way to force the whole project to use a proxy to avoid this? It just never gets to import the files because that request must be timing out. I do not get an error just nothing happens for a while until I get errors about being unable to start Tika.
2024-10-26 19:22:35,789 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar to /tmp/tika-server.jar.
[INFO] [2024-10-26 19:22:35,789] [tika.getRemoteJar] [line:802]: Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar to /tmp/tika-server.jar.
This seems to never finish. I've tried setting a proxy in docker systemctl and also in and it still doesn't seem to work. Anyone in China should have the same issues with it being a 10 minute+ download...
Even with doing RAGFLOW_IMAGE=infiniflow/ragflow:v0.12.0 it is still trying to download Tika :(
I configured a proxy for Ubuntu and then executed docker pull infiniflow/ragflow:dev, which allowed me to successfully pull all the images.
Experiencing the same, using Docker image as well. Looks to be the same issue described here (no JDK/JRE installed):
- https://github.com/chrismattmann/tika-python/issues/353
When I get around to it, will build my own image and report back.
@JayCroghan @jjmata Fix has been in main, while the image not released yet. Please build by yourself, or just wait v0.14.
@JayCroghan @jjmata Fix has been in main, while the image not released yet. Please build by yourself, or just wait v0.14.
Thank you. I got it working by trying one of the more “full” images meant for my situation. Thanks for the fix though!