paperless-ng
paperless-ng copied to clipboard
[BUG] Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office
Describe the bug
When processing a .docx
I got this error message:
Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office
To Reproduce Steps to reproduce the behavior:
- Use this docker-compose file
- Upload a file that is processed by gotenberg
Expected behavior I expected no error message
The issue is that gotenberg got a major update recently - version 7. This update included a change in the API.
The docker-compose files always use the latest version of gotenberg, now 7. Paperless-ng is not yet compatible with this version.
Using version 6 of gotenberg resolved the issue for me.
@philippguertler I've created a pull request to fix your issue here. If you find the time, it would be nice to check whether this resolves your issue or not. Please free to test with as much documents as you like. I only tested the new Gotenberg version with a few ones.
@Tooa - I want to test this, but am running docker on windows... is there nightly build off of master that I would be able to install? Would running one of these, or compiling my own, break anything or make upgrading to the next release more difficult, loose data, etc.?
@Tooa - I think I was able to build the container from master and have it running (honestly not 100% sure how to confirm) and still got the error today for a new docx. Below is the error message and a screenshot of my docker desktop.
[2021-09-14 12:10:20,568] [INFO] [paperless.parsing.tika] Sending /tmp/paperless/paperless-mail-su0xofsc to Tika server
[2021-09-14 12:10:22,836] [INFO] [paperless.parsing.tika] Converting /tmp/paperless/paperless-mail-su0xofsc to PDF as /tmp/paperless/paperless-km8zjrvk/convert.pdf
[2021-09-14 12:10:22,891] [DEBUG] [paperless.parsing.tika] Deleting directory /tmp/paperless/paperless-km8zjrvk
[2021-09-14 12:10:22,900] [ERROR] [paperless.consumer] Error while consuming document Pd invoice 9-1-2021.docx: Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 79, in convert_to_pdf
response.raise_for_status() # ensure we notice bad responses
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 65, in parse
self.archive_path = self.convert_to_pdf(document_path, file_name)
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 81, in convert_to_pdf
raise ParseError(
documents.parsers.ParseError: Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office
My mistakes. I checked out master instead of dev, and didn't update my docker-compose.yaml to move from thecodingmachine to gotenberg. My first check now worked.
@smseidl did you have to add any special command on your docker-compose for gotenberg?
I added the below to gotenberg and still not working using image: gotenberg/gotenberg:7 and image: jonaswinkler/paperless-ng:dev
command:
- "gotenberg"
- "--chromium-disable-routes=true"
I had to switch to image thecodingmachine/gotenberg:6
in order to get it to work. It looks like version 7 changed the conversion endpoint
gotenberg API has changed, see https://gotenberg.dev/docs/modules/libreoffice, editing parsers.py with the correct end point fixes the problem. It works for me, I converted 20 odt documents without problems.
--- parsers.py 2021-11-04 13:06:34.210460056 +0100
+++ parsers_7.py 2021-11-04 13:01:18.841545745 +0100
@@ -67,7 +67,7 @@
def convert_to_pdf(self, document_path, file_name):
pdf_path = os.path.join(self.tempdir, "convert.pdf")
gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT
- url = gotenberg_server + "/convert/office"
+ url = gotenberg_server + "/forms/libreoffice/convert"
self.log("info", f"Converting {document_path} to PDF as {pdf_path}")
files = {"files": (file_name or os.path.basename(document_path),
I had to switch to image
thecodingmachine/gotenberg:6
in order to get it to work. It looks like version 7 changed the conversion endpoint
I too was having this issue, changing image back to "6" resolved it!
gotenberg API has changed, see https://gotenberg.dev/docs/modules/libreoffice, editing parsers.py with the correct end point fixes the problem. It works for me, I converted 20 odt documents without problems.
--- parsers.py 2021-11-04 13:06:34.210460056 +0100 +++ parsers_7.py 2021-11-04 13:01:18.841545745 +0100 @@ -67,7 +67,7 @@ def convert_to_pdf(self, document_path, file_name): pdf_path = os.path.join(self.tempdir, "convert.pdf") gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT - url = gotenberg_server + "/convert/office" + url = gotenberg_server + "/forms/libreoffice/convert" self.log("info", f"Converting {document_path} to PDF as {pdf_path}") files = {"files": (file_name or os.path.basename(document_path),
Thanks @dafinga - just wanted to confirm this change worked fine for me! 👍
(NB. Just seen that dev is already up-to-date on this - lovely!)
@dcgsteve apologies if this is a dumb question, but, if I login to the docker container with bash, where is this file (parsers.py / parsers_7.py) located so I can modify? Thanks
It's here :)
https://github.com/dcgsteve/paperless-ng/blob/7bc8325df910ab57ed07849a3ce49a3011ba55b6/src/paperless_tika/parsers.py#L67
Ah -- thank you figured it out and working! If anyone uses linuxserver image:
docker exec -it paperless-ng /bin/bash edit: /app/paperless/src/paperless_tika/parsers.py
If anybody else wants to try the change but doesn't want to build the image manually, I've uploaded it to siancu/paperless-ng. It has the change to the parsers.py file as well as the change in docker-compose.yml.
I solved it with
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000/forms/libreoffice/convert#