tika-python
tika-python copied to clipboard
RuntimeError: Unable to start Tika server.
I created a function that parses a PDF file using TIKA in a service and when I tried to dockerize it, it displays this error : parse_pdf(tmp_path)
File "/app/process.py", line 90, in parse_pdf
data = parser.from_file('document-page' + str(i) + '.pdf', headers=headers)
File "/usr/local/lib/python3.8/site-packages/tika/parser.py", line 40, in from_file
output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)
File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 336, in parse1
status, response = callServer('put', serverEndpoint, service, f,
File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 531, in callServer
serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path)
File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 601, in checkTikaServer
raise RuntimeError("Unable to start Tika server.")
RuntimeError: Unable to start Tika server.
I couldn't fix this error, I am using tika==1.24 and FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9
"To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background."
You need to install java in a container:
RUN apt-get install -y default-jdk
correct @Horasachy