OCRmyPDF-web icon indicating copy to clipboard operation
OCRmyPDF-web copied to clipboard

headers['Content-Length'] = pdict['CONTENT-LENGTH']

Open marlemion opened this issue 6 years ago • 1 comments

Hello, I am running a slightly modified version of ocrmypdf-web:

#!/usr/bin/env python
import hug

import subprocess
from tempfile import NamedTemporaryFile

api = hug.API(__name__)
api.http.add_middleware(hug.middleware.CORSMiddleware(api))


@hug.get('/', output=hug.output_format.file)
def index():
    return "index.html"


@hug.get('/static/{fn}', output=hug.output_format.file)
def static(fn):
    return 'static/{}'.format(fn)


@hug.post('/ocr', output=hug.output_format.file)
def ocr(body, response, language: "The language(s) to use for OCR"="eng+deu+fra"):
    if not len(body) == 1:
        raise Exception("Need exactly one file!")

    fn, content = list(body.items()).pop()

    f_out = NamedTemporaryFile(suffix='.pdf')

    with NamedTemporaryFile(suffix='.pdf', mode="wb") as f_in:
        f_in.write(content)
        f_in.flush()

        #proc = subprocess.Popen(['ocrmypdf', '--force-ocr', '-l', language, f_in.name, f_out.name])
        proc = subprocess.Popen(['ocrmypdf', '-l', language, '--pdf-renderer', 'tesseract', '--output-type', 'pdf', f_in.name, f_out.name])

        code = proc.wait()

        response.set_header('X-OCR-Exit-Code', str(code))

        print(f_out.name)

        return f_out

hug is at version 2.4.1 and python at version 3.7.

This code used to run for a longer time and serve as a web interface for ocrmypdf (text recognition unsing tesseract). I think since version 3.7 of python, somehow hug crashes, when a file is uploaded to the server:

Unhandled exception in thread started by <function reload_checker at 0x7fec61f69ea0>
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/hug/development_runner.py", line 108, in reload_checker
    if path[-4:] in ('.pyo', '.pyc'):
TypeError: 'NoneType' object is not subscriptable

/#######################################################################\
          `.----``..-------..``.----.
         :/:::::--:---------:--::::://.
        .+::::----##/-/oo+:-##----:::://
        `//::-------/oosoo-------::://.       ##    ##  ##    ##    #####
          .-:------./++o/o-.------::-`   ```  ##    ##  ##    ##  ##
             `----.-./+o+:..----.     `.:///. ########  ##    ## ##
   ```        `----.-::::::------  `.-:::://. ##    ##  ##    ## ##   ####
  ://::--.``` -:``...-----...` `:--::::::-.`  ##    ##  ##   ##   ##    ##
  :/:::::::::-:-     `````      .:::::-.`     ##    ##    ####     ######
   ``.--:::::::.                .:::.`
         ``..::.                .::         EMBRACE THE APIs OF THE FUTURE
             ::-                .:-
             -::`               ::-                   VERSION 2.4.1
             `::-              -::`
              -::-`           -::-
\########################################################################/

 Copyright (C) 2016 Timothy Edmund Crosley
 Under the MIT License


Serving on :8000...
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET / HTTP/1.1" 200 4027
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/template.css HTTP/1.1" 200 13294
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/color.css HTTP/1.1" 200 2778
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/index.css HTTP/1.1" 200 855
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/generic.css HTTP/1.1" 200 33260
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/font.css HTTP/1.1" 200 670
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/animate.min.css HTTP/1.1" 200 55789
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/dropzone.js HTTP/1.1" 200 64976
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/dropzone.css HTTP/1.1" 200 11498
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/jquery-1.12.4.js HTTP/1.1" 200 293430
127.0.0.1 - - [06/Nov/2018 15:25:15] "GET /static/logo.png HTTP/1.1" 200 11487
127.0.0.1 - - [06/Nov/2018 15:25:15] "GET /static/bullet.png HTTP/1.1" 200 1406
Traceback (most recent call last):
  File "/usr/lib/python3.7/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "falcon/api.py", line 248, in falcon.api.API.__call__
  File "falcon/api.py", line 244, in falcon.api.API.__call__
  File "/usr/lib/python3.7/site-packages/hug/interface.py", line 793, in __call__
    raise exception
  File "/usr/lib/python3.7/site-packages/hug/interface.py", line 760, in __call__
    input_parameters = self.gather_parameters(request, response, context, api_version, **kwargs)
  File "/usr/lib/python3.7/site-packages/hug/interface.py", line 610, in gather_parameters
    body = body_formatter(body, **content_params)
  File "/usr/lib/python3.7/site-packages/hug/input_format.py", line 76, in multipart
    form = parse_multipart((body.stream if hasattr(body, 'stream') else body), header_params)
  File "/usr/lib/python3.7/cgi.py", line 220, in parse_multipart
    headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'
127.0.0.1 - - [06/Nov/2018 15:25:20] "POST /ocr HTTP/1.1" 500 59
127.0.0.1 - - [06/Nov/2018 15:25:20] "GET /error/HTTP_BAD_GATEWAY.html.var HTTP/1.1" 404 1485

I wonder what could be the reason? Google led me to this page: https://bugs.python.org/issue34226

But still I don't have any clue how to resolve this. OS is Arch btw.

marlemion avatar Nov 13 '18 11:11 marlemion

Hi, I had same problem. It is solved in hug 2.4.2

So, in Debian Buster is steps for working this : apt-get install python3-pip pip3 install appdirs falcon hug-middleware-cors packaging pyparsing python-mimeparse requests six hug==2.4.2 git clone https://github.com/sseemayer/OCRmyPDF-web.git cd OCRmyPDF-web hug -f server.py

If someone have more version of python, then will be maybe better to edit server.py and change : #!/usr/bin/env python to #!/usr/bin/env python3

systemd unit look like this: /etc/systemd/system/ocrmypdf.service

[Unit] Description = OCRMyPdf Simple Web Interface After = syslog.target network.target

[Service] User=ocrmypdf WorkingDirectory = /var/www/ocrmypdf/ ExecStart = /usr/local/bin/hug -f /var/www/ocrmypdf/server.py ExecStop = /usr/bin/killall hug Type = simple Restart = always

[Install] WantedBy = multi-user.target

Max

maxdevaine avatar Aug 11 '20 14:08 maxdevaine