yarl
yarl copied to clipboard
No distinction made between empty query string and undefined query string
The problem is illustrated by the following example:
>>> from yarl import URL
>>> URL('/stuff?')
URL('/stuff')
>>> URL('/stuff')
URL('/stuff')
RFC 3986 defines a query string using the following BNF:
query = *( pchar / "/" / "?" )
The distinction between an undefined query and an empty query is based solely on the presence of the '?' delimiter. yarl doesn't make that distinction, which means that libraries that make use of yarl (e.g. aiohttp) may also not provide a way to make that distinction.
The lack of distinction can be problematic in cases where there's a dependency on the exact representation of the URL. For example, if an HTTP client sends a request and includes a cryptographic hash that takes the representation of the request target (path and query) into account, and the HTTP server sees a request target that does not include a '?' and attempts to authenticate the hash using the request target, then authentication will fail.
GitMate.io thinks possibly related issues are https://github.com/aio-libs/yarl/issues/22 (Support non-UTF8 query strings), https://github.com/aio-libs/yarl/issues/91 (Encode semicolon in query string), and https://github.com/aio-libs/yarl/issues/181 (Encoding already encoded strings).
I understand your pain but the problem is not easy.
yarl uses urllib.parse for parsing URLs.
The standard library doesn't distinguish empty query without query absence.
Theoretically, there is a possibility to replace urllib.parse.urlsplit with a custom implementation but it takes a long time.
Could you provide more info? Is it a client or server code?
I'm writing an aiohttp client. Given that any URL that's sent to a aiohttp.ClientSession object makes use of a yarl.URL, it's impossible to handle this situation properly without a cheap, super hackish workaround like the following:
class _HackedUpQuery(str):
def __bool__(self):
return True
query_string = '' if query_string is None else _HackedUpQuery(query_string)
# pass to yarl.URL.build(), and pass the URL object to aiohttp.ClientSession
The hack makes me really nauseous, but it appears to work.
Note that I can work around the problem in aiohttp.Server using aiohttp.BaseRequest.raw_path.