falcon
falcon copied to clipboard
Provide a way to gracefully and securely handle non-percent-encoded query string values
When the web server allows a non-RFC-compliant query string (i.e., one that includes non-ASCII byte values that were not percent-encoded) to be passed to the app, we should provide a way for the app to at least detect such values. We might also provide an option for automatically rejecting them or perhaps attempting to decode them as UTF8.
Should we also add support for automatically percent-encoding query params when using the testing.simulate_*
methods?
@vytas7:
I noticed that Nginx blocks such requests outright with a 400, does not propagate them further to the app server.
Re Nginx, it's at least obvious if there is invalid percent coding: https://github.com/nginx/nginx/blob/37984f0be1a5d608015517a64927d498888fb7ca/src/http/ngx_http_parse.c#L1532 But I've tested non-ASCII quickly and it seems to yield a 400 as well.
Talking this issue in general, I'm admittedly not completely convinced it's something a Python 3 framework should/is able to handle at all as per PEP-3333 mechanics. Could we craft a simple example demonstrating where this is a practical problem under Python 3, be it using the Falcon test client, or a real server?
On Python 2, this hearkens back to https://github.com/falconry/falcon/issues/1031 which was IIRC quite hairy to properly address.
But regardless of Python version, I OTOH agree it might be worth reviewing how our test client handles path encoding edge cases.
Not sure if this is the issue this addresses but here's an example:
/search?offset=0&query=películas%20de%20terror&limit=10&version=
request.params
returns
{'offset': ['0'], 'query': ['pelÃ-culas de terror'], 'limit': ['10']}
urllib.parse.parse_qs
handles it correctly:
from urllib.parse import parse_qs
parse_qs("offset=0&query=películas%20de%20terror&limit=10&version=")
returns
{'offset': ['0'], 'query': ['películas de terror'], 'limit': ['10']}
Hi @sirfz, yes this is potentially an example.
Just checking, which WSGI (or ASGI?) app server are you using? And which HTTP client?
it's WSGI hosted with the latest uWSGI (testing with curl).
update: some extra context about this issue. The encoding of the query_string
itself seems to be messed up so can't even work around it by parsing the raw string (same applies for get_param
as well as uri
and relative_uri
parameters). Not sure if it's a falcon or uwsgi issue?
I don't think it is a Falcon or uWSGI issue per se, but we filed this to track scenarios like yours, and discuss whether we as a framework can do anything sensible about it.
If, as mentioned, you run it behind Nginx, non-ASCII URLs are rejected outright with an HTTP 400, so probably it is less of an issue in a typical production environment.
In PEP-3333 (WSGI), strings are tunneled as Latin-1, and if curl sends a UTF-8 string, things can get messed up completely, as you observed.