peps icon indicating copy to clipboard operation
peps copied to clipboard

PEP 3333: URL-decoding and routing

Open kefir- opened this issue 1 year ago • 3 comments

There have been several discussions over the years about how WSGI frameworks apply routing in the case of URL-encoded path components. Here's a comment that links to a few of the discussions:

https://github.com/encode/starlette/pull/1828#issuecomment-1434043248

The issue is that when using routing features, the URL /user/foo/edit appears to be indistinguishable from /user/foo%2Fedit, and in case of routing rules for /user/{username} and /user/{username}/edit, the URL /user/foo%2Fedit will match the second rule and not the first.

This seems to me to be in conflict with RFC 3986 section 2.4 and RFC 3986 section 2.2. The latter states:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

But the routing mechanism in popular WSGI frameworks such as Flask and FastAPI are unable to differentiate the mentioned URLs, because the URI's percent-encoding is decoded before the routing happens.

I believe PEP-3333 needs to clarify the correct behaviour.

kefir- avatar Aug 08 '23 06:08 kefir-

ددد

mahamad1234 avatar Sep 30 '23 13:09 mahamad1234

There have been several discussions over the years about how WSGI frameworks apply routing in the case of URL-encoded path components. Here's a comment that links to a few of the discussions:

encode/starlette#1828 (comment)

The issue is that when using routing features, the URL /user/foo/edit appears to be indistinguishable from /user/foo%2Fedit, and in case of routing rules for /user/{username} and /user/{username}/edit, the URL /user/foo%2Fedit will match the second rule and not the first.

This seems to me to be in conflict with RFC 3986 section 2.4 and RFC 3986 section 2.2. The latter states:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

But the routing mechanism in popular WSGI frameworks such as Flask and FastAPI are unable to differentiate the mentioned URLs, because the URI's percent-encoding is decoded before the routing happens.

I believe PEP-3333 needs to clarify the correct behaviour.

https://github.com/python/peps/issues/3280#issue-1840658032

Karliz24 avatar Jan 29 '24 10:01 Karliz24

This issue was discussed in detail in 2008

[Web-SIG] WSGI Amendments thoughts: the horror of charsets https://www.mail-archive.com/[email protected]/msg02483.html

It's a complex situation, I think that there is no straightforward answer.

I think it likely that the only way that the WSGI spec will change in relation to this issue is if there is a specific change proposed.

amak avatar Jan 29 '24 17:01 amak