bottle
bottle copied to clipboard
UnicodeEncodeError when using webob in middleware
I have a wsgi-based middleware for my bottle app
import os
import sys
from bottle import Bottle, run, redirect
import webob # WebOb==1.8.7
app = Bottle(__name__)
@app.route("/")
def index():
redirect("/home")
class VeryBasicMiddleware:
def __init__(self, wsgi_app, app_name="WSGI Application"):
self.app_name = app_name
self.wsgi_app = wsgi_app
def __call__(self, environ, start_response):
# I do some middleware work here
my_request = webob.BaseRequest(environ)
# call request.path here is fine
my_request.path
response = webob.Request(environ).get_response(self.wsgi_app)
# call request.path here gives UnicodeEncodeError
my_request.path
# do more middleware work here
return self.wsgi_app(environ, start_response)
app = VeryBasicMiddleware(app)
if __name__ == "__main__":
host = sys.argv[1] if len(sys.argv) > 1 else "localhost"
port = int(sys.argv[2]) if len(sys.argv) > 2 else 8000
run(app=app, host=host, port=port)
As you can see this middleware relies on webob.
If a request with some weird chars is made, for example curl http://localhost:8000/full/assets/pdf/1394002_161121%20%E2%91%A0IFA%20ver%20%2012.2_SH0202_PDF%20version.pdf
It receives the following error:
Traceback (most recent call last):
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/wsgiref/handlers.py", line 137, in run
self.result = application(self.environ, self.start_response)
File "apps/bottle_app.py", line 22, in __call__
return self.app(environ, start_response)
File "apps/bottle_app.py", line 42, in __call__
my_request.path
File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/request.py", line 476, in path
bpath = bytes_(self.path_info, self.url_encoding)
File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/descriptors.py", line 70, in fget
return req.encget(key, encattr=encattr)
File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/request.py", line 165, in encget
return bytes_(val, 'latin-1').decode(encoding)
File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/compat.py", line 33, in bytes_
return s.encode(encoding, errors)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2460' in position 32: ordinal not in range(256)
The error does happen in webob, but I dug deep into each of the calls, and found that the issue is that in my middleware's call webob.Request(environ).get_response(self.wsgi_app)
, when it reaches the bottle.py stack, bottle is changing the value of environ['PATH_INFO']
, here:
def _handle(self, environ):
path = environ['bottle.raw_path'] = environ['PATH_INFO']
if py3k:
try:
environ['PATH_INFO'] = path.encode('latin1').decode('utf8')
except UnicodeError:
return HTTPError(400, 'Invalid path string. Expected UTF-8')
I have temporarily solved by problem by storing the original path in my middleware, like so
def __call__(self, environ, start_response):
# self.orig_path_info = environ.get("PATH_INFO")
# I do some work here
my_request = webob.BaseRequest(environ)
# call request.path here is fine
my_request.path
response = webob.Request(environ).get_response(self.wsgi_app)
# restore path info so call to `my_request.path` works
environ["PATH_INFO"] = self.orig_path_info
# call request.path now works
my_request.path
# do more work here
return self.wsgi_app(environ, start_response)
But this seems like a subpar solution. Is this a bug? It could very well be a webob issue, I'm aware, but I actually have an identical Flask framework application with an identical middleware and that framework does not have this issue.
Some values (e.g. headers or the path string) are bytes on the HTTP layer with no encoding information attached, but the application needs these as str
, which is unicode in Python 3. The WSGI spec uses latin-1
to decode these bytestrings, which is a wrong in most cases, but at least latin-1
can be reversed without loosing information. Bottle then re-interprets these as utf-8
because that is what all modern browsers do.
Bottle is not a middleware, so it assumes it has authority over the environ dictionary and stores the re-encoded value back into the environ dict. This could be done better in Bottle (e.g. storing the re-encoded value in a special key instead of overwriting PATH_INFO) but it is more undefined behavior than a bug, really. You should make a copy of the environ dictionary before passing it to bottle if you want to still use it afterwards.