ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: Unable to run java; is it installed?

Open JayCroghan opened this issue 1 year ago • 5 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

791afbb

Other environment information

Ubuntu 22.04.5 LTS - docker.

Actual behavior

When trying to import documents I get errors about Java not being installed.

    raise RuntimeError("Unable to start Tika server.")
RuntimeError: Unable to start Tika server.
2024-10-18 03:39:32,511 [MainThread  ] [ERROR]  Unable to run java; is it installed?
[ERROR] [2024-10-18 03:39:32,511] [tika.startServer] [line:669]: Unable to run java; is it installed?
2024-10-18 03:39:32,512 [MainThread  ] [ERROR]  Failed to receive startup confirmation from startServer.
[ERROR] [2024-10-18 03:39:32,512] [tika.checkTikaServer] [line:601]: Failed to receive startup confirmation from startServer.

Expected behavior

The docx files get imported.

Steps to reproduce

Started server with default config, changed the ports to 81 and 444. Tried to import docx files and all said fail, unable to start tika server.

Additional information

No response

JayCroghan avatar Oct 17 '24 19:10 JayCroghan

Do you mean in docker images? Yes, it is.

KevinHuSh avatar Oct 18 '24 01:10 KevinHuSh

Do you mean in docker images? Yes, it is.

The title of the bug is the error message in the container log…

JayCroghan avatar Oct 18 '24 06:10 JayCroghan

The problem seems to be downloading Tika when I first import documents. It logs that it is requesting from http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar

I am based in China so I think the request is just timing out. Is there a way to force the whole project to use a proxy to avoid this? It just never gets to import the files because that request must be timing out. I do not get an error just nothing happens for a while until I get errors about being unable to start Tika.

JayCroghan avatar Oct 26 '24 10:10 JayCroghan


2024-10-26 19:22:35,789 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar to /tmp/tika-server.jar.
[INFO] [2024-10-26 19:22:35,789] [tika.getRemoteJar] [line:802]: Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar to /tmp/tika-server.jar.

This seems to never finish. I've tried setting a proxy in docker systemctl and also in and it still doesn't seem to work. Anyone in China should have the same issues with it being a 10 minute+ download...

JayCroghan avatar Oct 26 '24 11:10 JayCroghan

Even with doing RAGFLOW_IMAGE=infiniflow/ragflow:v0.12.0 it is still trying to download Tika :(

JayCroghan avatar Oct 26 '24 12:10 JayCroghan

I configured a proxy for Ubuntu and then executed docker pull infiniflow/ragflow:dev, which allowed me to successfully pull all the images.

Feiue avatar Oct 28 '24 08:10 Feiue

Experiencing the same, using Docker image as well. Looks to be the same issue described here (no JDK/JRE installed):

  • https://github.com/chrismattmann/tika-python/issues/353

When I get around to it, will build my own image and report back.

jjmata avatar Oct 31 '24 12:10 jjmata

@JayCroghan @jjmata Fix has been in main, while the image not released yet. Please build by yourself, or just wait v0.14.

yuzhichang avatar Nov 02 '24 14:11 yuzhichang

@JayCroghan @jjmata Fix has been in main, while the image not released yet. Please build by yourself, or just wait v0.14.

Thank you. I got it working by trying one of the more “full” images meant for my situation. Thanks for the fix though!

JayCroghan avatar Nov 02 '24 14:11 JayCroghan