OCRmyPDF-web
OCRmyPDF-web copied to clipboard
headers['Content-Length'] = pdict['CONTENT-LENGTH']
Hello, I am running a slightly modified version of ocrmypdf-web:
#!/usr/bin/env python
import hug
import subprocess
from tempfile import NamedTemporaryFile
api = hug.API(__name__)
api.http.add_middleware(hug.middleware.CORSMiddleware(api))
@hug.get('/', output=hug.output_format.file)
def index():
return "index.html"
@hug.get('/static/{fn}', output=hug.output_format.file)
def static(fn):
return 'static/{}'.format(fn)
@hug.post('/ocr', output=hug.output_format.file)
def ocr(body, response, language: "The language(s) to use for OCR"="eng+deu+fra"):
if not len(body) == 1:
raise Exception("Need exactly one file!")
fn, content = list(body.items()).pop()
f_out = NamedTemporaryFile(suffix='.pdf')
with NamedTemporaryFile(suffix='.pdf', mode="wb") as f_in:
f_in.write(content)
f_in.flush()
#proc = subprocess.Popen(['ocrmypdf', '--force-ocr', '-l', language, f_in.name, f_out.name])
proc = subprocess.Popen(['ocrmypdf', '-l', language, '--pdf-renderer', 'tesseract', '--output-type', 'pdf', f_in.name, f_out.name])
code = proc.wait()
response.set_header('X-OCR-Exit-Code', str(code))
print(f_out.name)
return f_out
hug is at version 2.4.1 and python at version 3.7.
This code used to run for a longer time and serve as a web interface for ocrmypdf (text recognition unsing tesseract). I think since version 3.7 of python, somehow hug crashes, when a file is uploaded to the server:
Unhandled exception in thread started by <function reload_checker at 0x7fec61f69ea0>
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/hug/development_runner.py", line 108, in reload_checker
if path[-4:] in ('.pyo', '.pyc'):
TypeError: 'NoneType' object is not subscriptable
/#######################################################################\
`.----``..-------..``.----.
:/:::::--:---------:--::::://.
.+::::----##/-/oo+:-##----:::://
`//::-------/oosoo-------::://. ## ## ## ## #####
.-:------./++o/o-.------::-` ``` ## ## ## ## ##
`----.-./+o+:..----. `.:///. ######## ## ## ##
``` `----.-::::::------ `.-:::://. ## ## ## ## ## ####
://::--.``` -:``...-----...` `:--::::::-.` ## ## ## ## ## ##
:/:::::::::-:- ````` .:::::-.` ## ## #### ######
``.--:::::::. .:::.`
``..::. .:: EMBRACE THE APIs OF THE FUTURE
::- .:-
-::` ::- VERSION 2.4.1
`::- -::`
-::-` -::-
\########################################################################/
Copyright (C) 2016 Timothy Edmund Crosley
Under the MIT License
Serving on :8000...
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET / HTTP/1.1" 200 4027
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/template.css HTTP/1.1" 200 13294
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/color.css HTTP/1.1" 200 2778
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/index.css HTTP/1.1" 200 855
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/generic.css HTTP/1.1" 200 33260
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/font.css HTTP/1.1" 200 670
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/animate.min.css HTTP/1.1" 200 55789
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/dropzone.js HTTP/1.1" 200 64976
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/dropzone.css HTTP/1.1" 200 11498
127.0.0.1 - - [06/Nov/2018 15:25:14] "GET /static/jquery-1.12.4.js HTTP/1.1" 200 293430
127.0.0.1 - - [06/Nov/2018 15:25:15] "GET /static/logo.png HTTP/1.1" 200 11487
127.0.0.1 - - [06/Nov/2018 15:25:15] "GET /static/bullet.png HTTP/1.1" 200 1406
Traceback (most recent call last):
File "/usr/lib/python3.7/wsgiref/handlers.py", line 137, in run
self.result = application(self.environ, self.start_response)
File "falcon/api.py", line 248, in falcon.api.API.__call__
File "falcon/api.py", line 244, in falcon.api.API.__call__
File "/usr/lib/python3.7/site-packages/hug/interface.py", line 793, in __call__
raise exception
File "/usr/lib/python3.7/site-packages/hug/interface.py", line 760, in __call__
input_parameters = self.gather_parameters(request, response, context, api_version, **kwargs)
File "/usr/lib/python3.7/site-packages/hug/interface.py", line 610, in gather_parameters
body = body_formatter(body, **content_params)
File "/usr/lib/python3.7/site-packages/hug/input_format.py", line 76, in multipart
form = parse_multipart((body.stream if hasattr(body, 'stream') else body), header_params)
File "/usr/lib/python3.7/cgi.py", line 220, in parse_multipart
headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'
127.0.0.1 - - [06/Nov/2018 15:25:20] "POST /ocr HTTP/1.1" 500 59
127.0.0.1 - - [06/Nov/2018 15:25:20] "GET /error/HTTP_BAD_GATEWAY.html.var HTTP/1.1" 404 1485
I wonder what could be the reason? Google led me to this page: https://bugs.python.org/issue34226
But still I don't have any clue how to resolve this. OS is Arch btw.
Hi, I had same problem. It is solved in hug 2.4.2
So, in Debian Buster is steps for working this : apt-get install python3-pip pip3 install appdirs falcon hug-middleware-cors packaging pyparsing python-mimeparse requests six hug==2.4.2 git clone https://github.com/sseemayer/OCRmyPDF-web.git cd OCRmyPDF-web hug -f server.py
If someone have more version of python, then will be maybe better to edit server.py and change : #!/usr/bin/env python to #!/usr/bin/env python3
systemd unit look like this: /etc/systemd/system/ocrmypdf.service
[Unit] Description = OCRMyPdf Simple Web Interface After = syslog.target network.target
[Service] User=ocrmypdf WorkingDirectory = /var/www/ocrmypdf/ ExecStart = /usr/local/bin/hug -f /var/www/ocrmypdf/server.py ExecStop = /usr/bin/killall hug Type = simple Restart = always
[Install] WantedBy = multi-user.target
Max