falcon icon indicating copy to clipboard operation
falcon copied to clipboard

Provide a way to gracefully and securely handle non-percent-encoded query string values

Open kgriffs opened this issue 4 years ago • 15 comments

When the web server allows a non-RFC-compliant query string (i.e., one that includes non-ASCII byte values that were not percent-encoded) to be passed to the app, we should provide a way for the app to at least detect such values. We might also provide an option for automatically rejecting them or perhaps attempting to decode them as UTF8.

Should we also add support for automatically percent-encoding query params when using the testing.simulate_* methods?

kgriffs avatar Feb 25 '20 01:02 kgriffs

@vytas7:

I noticed that Nginx blocks such requests outright with a 400, does not propagate them further to the app server.

kgriffs avatar Feb 25 '20 15:02 kgriffs

Re Nginx, it's at least obvious if there is invalid percent coding: https://github.com/nginx/nginx/blob/37984f0be1a5d608015517a64927d498888fb7ca/src/http/ngx_http_parse.c#L1532 But I've tested non-ASCII quickly and it seems to yield a 400 as well.

vytas7 avatar Feb 25 '20 16:02 vytas7

Talking this issue in general, I'm admittedly not completely convinced it's something a Python 3 framework should/is able to handle at all as per PEP-3333 mechanics. Could we craft a simple example demonstrating where this is a practical problem under Python 3, be it using the Falcon test client, or a real server?

On Python 2, this hearkens back to https://github.com/falconry/falcon/issues/1031 which was IIRC quite hairy to properly address.

But regardless of Python version, I OTOH agree it might be worth reviewing how our test client handles path encoding edge cases.

vytas7 avatar Feb 25 '20 16:02 vytas7

Not sure if this is the issue this addresses but here's an example:

/search?offset=0&query=películas%20de%20terror&limit=10&version=

request.params returns

{'offset': ['0'], 'query': ['pelÃ-culas de terror'], 'limit': ['10']}

urllib.parse.parse_qs handles it correctly:

from urllib.parse import parse_qs

parse_qs("offset=0&query=películas%20de%20terror&limit=10&version=")

returns

{'offset': ['0'], 'query': ['películas de terror'], 'limit': ['10']}

sirfz avatar Jul 18 '23 12:07 sirfz

Hi @sirfz, yes this is potentially an example.

Just checking, which WSGI (or ASGI?) app server are you using? And which HTTP client?

vytas7 avatar Jul 18 '23 13:07 vytas7

it's WSGI hosted with the latest uWSGI (testing with curl).

update: some extra context about this issue. The encoding of the query_string itself seems to be messed up so can't even work around it by parsing the raw string (same applies for get_param as well as uri and relative_uri parameters). Not sure if it's a falcon or uwsgi issue?

sirfz avatar Jul 18 '23 13:07 sirfz

I don't think it is a Falcon or uWSGI issue per se, but we filed this to track scenarios like yours, and discuss whether we as a framework can do anything sensible about it.

If, as mentioned, you run it behind Nginx, non-ASCII URLs are rejected outright with an HTTP 400, so probably it is less of an issue in a typical production environment.

In PEP-3333 (WSGI), strings are tunneled as Latin-1, and if curl sends a UTF-8 string, things can get messed up completely, as you observed.

vytas7 avatar Jul 18 '23 13:07 vytas7