yarl
yarl copied to clipboard
Query parameters didn't get fully encoded
When I build an URL with yarl.URL.build
I get the URL with partly encoded query:
from yarl import URL
scheme = 'http'
host = 'profile-srv.kama.gs'
path = '/oauth/authorize'
query = {'response_type': 'code', 'client_id': 'bEjgr4rUka8Oswy', 'redirect_uri': 'http://172.25.210.184/authorized?state=e8be712d-3858-4549-86b2-60cea46c7396'}
URL.build(scheme=scheme, host=host, path=path, query=query)
# URL('http://profile-srv.kama.gs/oauth/authorize?response_type=code&client_id=bEjgr4rUka8Oswy&redirect_uri=http://172.25.210.184/authorized?state%3De8be712d-3858-4549-86b2-60cea46c7396')
:
, /
, ?
are not encoded, as you see.
Hi, I'm GitMate.io!
It seems you've just enabled the issue triaging. I'm just scraping all issues from your repository and will give you some more information about this in a few minutes or so.
Because of the rate limit we can't scrape all information (including all comments and authors) right now - our system is already set up to scrape this in the next days over which the predictions will become more precise every day.
If you want me to use a different account for triaging your issues, simply create one and log in with it.
Sit tight!
GitMate.io thinks possibly related issues are https://github.com/aio-libs/yarl/issues/210 (Comma not being encoded in GET request params), https://github.com/aio-libs/yarl/issues/158 (Support encoded parameter in URL.build), https://github.com/aio-libs/yarl/issues/102 (Document encoded parameter), https://github.com/aio-libs/yarl/issues/45 (params do not encode right), and https://github.com/aio-libs/yarl/issues/91 (Encode semicolon in query string).
@rominf I found your issue here after filing a very similar one at https://github.com/aio-libs/yarl/issues/301.
My hunch about this behavior is that it is actually yarl.URL
that is more closely following RFC 3986. If you look at the ABNF for RFC 3986:
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
query = *( pchar / "/" / "?" )
This means that /
and ?
are okay to be put in unencoded (literal) form in the URL query string. The reason for that, I would assume, is that the ?
is unambiguous: you scan the URL for the first ?
, and everything after that is the query string; once you hit any other occurrences, you already know those are literal parts of the query string parameters. Similarly, literal /
characters should be okay in the query string because, since the parser has already seen the first ?
, it should know it's no longer in the path
component of the URL.
My own confusion from this arose from the fact that urllib.parse.urlencode()
(which is used by requests
, among other libraries) does not treat ?
as safe; it encodes it:
>>> import urllib.parse
>>> query = {'response_type': 'code', 'client_id': 'bEjgr4rUka8Oswy', 'redirect_uri': 'http://172.25.210.184/authorized?state=e8be712d-3858-4549-86b2-60cea46c7396'}
>>> urllib.parse.urlencode(query)
'response_type=code&client_id=bEjgr4rUka8Oswy&redirect_uri=http%3A%2F%2F172.25.210.184%2Fauthorized%3Fstate%3De8be712d-3858-4549-86b2-60cea46c7396'
What it all comes down to is that urllib.parse.urlencode
eventually calls quote()
from that same module. And here is the description of quote()
:
Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted. The quote function offers a cautious (not minimal) way to quote a string for most of these parts.
For a different discussion see: https://github.com/psf/requests/issues/794
Not escaping the ? in query arguments is causing many web servers to reject the connections (I crawl >100.000 pages on more than >10.000 web servers through aiohttp which uses yarl intensively).
The standard does not force "?" from being encoded but does not forbid it either. Consequently, there are 2 alternatives which are standard-compliant and not one. Of these 2, the only alternative that seems webservers-compatible is to always encode it.
Another concrete case of this documented here: https://github.com/aio-libs/aiohttp/issues/5319#issuecomment-1178507468