paperless-ng icon indicating copy to clipboard operation
paperless-ng copied to clipboard

[BUG] Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office

Open philippguertler opened this issue 3 years ago • 14 comments

Describe the bug When processing a .docx I got this error message:

Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office

To Reproduce Steps to reproduce the behavior:

  1. Use this docker-compose file
  2. Upload a file that is processed by gotenberg

Expected behavior I expected no error message

The issue is that gotenberg got a major update recently - version 7. This update included a change in the API.

The docker-compose files always use the latest version of gotenberg, now 7. Paperless-ng is not yet compatible with this version.

Using version 6 of gotenberg resolved the issue for me.

philippguertler avatar Aug 23 '21 14:08 philippguertler

@philippguertler I've created a pull request to fix your issue here. If you find the time, it would be nice to check whether this resolves your issue or not. Please free to test with as much documents as you like. I only tested the new Gotenberg version with a few ones.

Tooa avatar Aug 27 '21 06:08 Tooa

@Tooa - I want to test this, but am running docker on windows... is there nightly build off of master that I would be able to install? Would running one of these, or compiling my own, break anything or make upgrading to the next release more difficult, loose data, etc.?

smseidl avatar Sep 14 '21 12:09 smseidl

@Tooa - I think I was able to build the container from master and have it running (honestly not 100% sure how to confirm) and still got the error today for a new docx. Below is the error message and a screenshot of my docker desktop.

[2021-09-14 12:10:20,568] [INFO] [paperless.parsing.tika] Sending /tmp/paperless/paperless-mail-su0xofsc to Tika server

[2021-09-14 12:10:22,836] [INFO] [paperless.parsing.tika] Converting /tmp/paperless/paperless-mail-su0xofsc to PDF as /tmp/paperless/paperless-km8zjrvk/convert.pdf

[2021-09-14 12:10:22,891] [DEBUG] [paperless.parsing.tika] Deleting directory /tmp/paperless/paperless-km8zjrvk

[2021-09-14 12:10:22,900] [ERROR] [paperless.consumer] Error while consuming document Pd invoice 9-1-2021.docx: Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office

Traceback (most recent call last):

File "/usr/src/paperless/src/paperless_tika/parsers.py", line 79, in convert_to_pdf

response.raise_for_status()  # ensure we notice bad responses

File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/src/paperless/src/documents/consumer.py", line 248, in try_consume_file

document_parser.parse(self.path, mime_type, self.filename)

File "/usr/src/paperless/src/paperless_tika/parsers.py", line 65, in parse

self.archive_path = self.convert_to_pdf(document_path, file_name)

File "/usr/src/paperless/src/paperless_tika/parsers.py", line 81, in convert_to_pdf

raise ParseError(

documents.parsers.ParseError: Error while converting document to PDF: 404 Client Error: Not Found for url: http://gotenberg:3000/convert/office

image

smseidl avatar Sep 14 '21 17:09 smseidl

My mistakes. I checked out master instead of dev, and didn't update my docker-compose.yaml to move from thecodingmachine to gotenberg. My first check now worked.

smseidl avatar Sep 16 '21 12:09 smseidl

@smseidl did you have to add any special command on your docker-compose for gotenberg?

I added the below to gotenberg and still not working using image: gotenberg/gotenberg:7 and image: jonaswinkler/paperless-ng:dev

command:
  - "gotenberg"
  - "--chromium-disable-routes=true"

almanzarj avatar Oct 22 '21 14:10 almanzarj

I had to switch to image thecodingmachine/gotenberg:6 in order to get it to work. It looks like version 7 changed the conversion endpoint

kabili207 avatar Oct 28 '21 16:10 kabili207

gotenberg API has changed, see https://gotenberg.dev/docs/modules/libreoffice, editing parsers.py with the correct end point fixes the problem. It works for me, I converted 20 odt documents without problems.

--- parsers.py	2021-11-04 13:06:34.210460056 +0100
+++ parsers_7.py	2021-11-04 13:01:18.841545745 +0100
@@ -67,7 +67,7 @@
     def convert_to_pdf(self, document_path, file_name):
         pdf_path = os.path.join(self.tempdir, "convert.pdf")
         gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT
-        url = gotenberg_server + "/convert/office"
+        url = gotenberg_server + "/forms/libreoffice/convert"
 
         self.log("info", f"Converting {document_path} to PDF as {pdf_path}")
         files = {"files": (file_name or os.path.basename(document_path),

erm67 avatar Nov 04 '21 12:11 erm67

I had to switch to image thecodingmachine/gotenberg:6 in order to get it to work. It looks like version 7 changed the conversion endpoint

I too was having this issue, changing image back to "6" resolved it!

pablosed avatar Nov 17 '21 11:11 pablosed

gotenberg API has changed, see https://gotenberg.dev/docs/modules/libreoffice, editing parsers.py with the correct end point fixes the problem. It works for me, I converted 20 odt documents without problems.

--- parsers.py	2021-11-04 13:06:34.210460056 +0100
+++ parsers_7.py	2021-11-04 13:01:18.841545745 +0100
@@ -67,7 +67,7 @@
     def convert_to_pdf(self, document_path, file_name):
         pdf_path = os.path.join(self.tempdir, "convert.pdf")
         gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT
-        url = gotenberg_server + "/convert/office"
+        url = gotenberg_server + "/forms/libreoffice/convert"
 
         self.log("info", f"Converting {document_path} to PDF as {pdf_path}")
         files = {"files": (file_name or os.path.basename(document_path),

Thanks @dafinga - just wanted to confirm this change worked fine for me! 👍

(NB. Just seen that dev is already up-to-date on this - lovely!)

dcgsteve avatar Nov 20 '21 20:11 dcgsteve

@dcgsteve apologies if this is a dumb question, but, if I login to the docker container with bash, where is this file (parsers.py / parsers_7.py) located so I can modify? Thanks

yieldhog avatar Nov 25 '21 01:11 yieldhog

It's here :)

https://github.com/dcgsteve/paperless-ng/blob/7bc8325df910ab57ed07849a3ce49a3011ba55b6/src/paperless_tika/parsers.py#L67

dcgsteve avatar Nov 25 '21 08:11 dcgsteve

Ah -- thank you figured it out and working! If anyone uses linuxserver image:

docker exec -it paperless-ng /bin/bash edit: /app/paperless/src/paperless_tika/parsers.py

yieldhog avatar Nov 25 '21 18:11 yieldhog

If anybody else wants to try the change but doesn't want to build the image manually, I've uploaded it to siancu/paperless-ng. It has the change to the parsers.py file as well as the change in docker-compose.yml.

siancu avatar Nov 29 '21 20:11 siancu

I solved it with

PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000/forms/libreoffice/convert#

tompsg-git avatar Feb 03 '22 18:02 tompsg-git