paperless-ng
paperless-ng copied to clipboard
[BUG] Could not parse Excel files with tika server at http://tika:9998:
Describe the bug I am fresh installing paperless-ng using docker on raspberry pi 3b+ but even after multiple reinstalls from docker-compose as well as portainer I cant upload office file and parce them
To Reproduce Steps to reproduce the behavior:
- install paperless-ng through docker
- Click on 'upload'
- select any Microsoft office file (doc/exel)
- See error
Could not parse /tmp/paperless/paperless-upload-4__50jys with tika server at http://tika:9998: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Expected behavior The File should be uploaded and OCRed
Screenshots .
Webserver logs
15:16:38 [Q] INFO Process-1:5 ready for work at 107
15:16:39 [Q] ERROR Failed [acknowledgement sample 03.docx] - acknowledgement sample 03.docx: Error while consuming document acknowledgement sample 03.docx: Could not parse /tmp/paperless/paperless-upload-4__50jys with tika server at http://tika:9998: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known')) : Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/local/lib/python3.9/socket.py", line 953, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/local/lib/python3.9/http/client.py", line 1257, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1303, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1252, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.9/http/client.py", line 1012, in _send_output
self.send(msg)
File "/usr/local/lib/python3.9/http/client.py", line 952, in send
self.connect()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 200, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 49, in parse
parsed = parser.from_file(document_path, tika_server)
File "/usr/local/lib/python3.9/site-packages/tika/parser.py", line 40, in from_file
output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)
File "/usr/local/lib/python3.9/site-packages/tika/tika.py", line 336, in parse1
status, response = callServer('put', serverEndpoint, service, f,
File "/usr/local/lib/python3.9/site-packages/tika/tika.py", line 554, in callServer
resp = verbFn(serviceUrl, encodedData, **effectiveRequestOptions)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 132, in put
return request('put', url, data=data, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/asgiref/sync.py", line 288, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 51, in parse
raise ParseError(
documents.parsers.ParseError: Could not parse /tmp/paperless/paperless-upload-4__50jys with tika server at http://tika:9998: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/usr/src/paperless/src/documents/consumer.py", line 266, in try_consume_file
self._fail(
File "/usr/src/paperless/src/documents/consumer.py", line 70, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: acknowledgement sample 03.docx: Error while consuming document acknowledgement sample 03.docx: Could not parse /tmp/paperless/paperless-upload-4__50jys with tika server at http://tika:9998: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x71b69178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Relevant information
- Host OS of the machine running paperless: Raspbian GNU/Linux 10 (buster)
- Raspberry Pi
- Browser Firefox
- Version 1.5.0
- Installation method: docker
docker-compose
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
db:
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: jonaswinkler/paperless-ng:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: thecodingmachine/gotenberg
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
tika:
image: apache/tika
restart: unless-stopped
volumes:
data:
media:
pgdata:
The error you're seeing is because the paperless container cannot connect to the tika server, for some reason.
Try to add a container_name: tika
in your docker-compose.yml
(right below image: apache/tika
and see if the problem is solved.
Do the same for gotenberg to be on the safe side. I know that the hostname should be set to the service name, but maybe in this case it isn't.
I have the same problem, and that fix doesn't solve it. Unfortunatley makes paperless-ng useless for me, as I can't search office documents.
had the same issue, the solution in #1594 worked for me !
hmm didnt work for me,
Could not parse /tmp/paperless/paperless-upload-_v49baln with tika server at http://tika:9998: HTTPConnectionPool(host='tika', port=9998): Max retries exceeded with url: /rmeta/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xb1b6a178>: Failed to establish a new connection: [Errno -2] Name or service not known'))
@grutzifix You're probably running the apache/tika image on a non-amd64 architecture (see #1354). I had the same problem on my RPi4 with arm64. Try changing the tika image in your docker-compose.yml
to
image: abhilesh7/apache-tika-arm
You still have to fix the changed gotenberg endpoint of #1594
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000/forms/libreoffice/convert#
btw. nice nickname, greetings from bavaria :)
Try changing the tika image in your
docker-compose.yml
toimage: abhilesh7/apache-tika-arm
@Lucifer1590 this should also fix your problem on RPi3.
Worked! thx <3