yarl
yarl copied to clipboard
yarl.URL doesn't recognize BACKSLASH as host-path separator
Having "https://google%2Ecom\.yahoo.com/"
as URL, both Chrome and Firefox resolve it as "google.com" to be the domain, which is (most probably) what the URL spec is defining.
Right now, yarl doesn't recognize that:
In [8]: yarl.URL(r"https://google%2Ecom%2F.yahoo.com/").host
Out[8]: 'google%2ecom%2f.yahoo.com'
I think it's important to fix this, specially from a security perspective. What do you think?
GitMate.io thinks possibly related issues are https://github.com/aio-libs/yarl/issues/242 (Handle path argument of URL.build, which doesn't start from /), https://github.com/aio-libs/yarl/issues/84 (Incorrect handling of '..' in url path), https://github.com/aio-libs/yarl/issues/185 (Allow joining URL and pathlib.Path), https://github.com/aio-libs/yarl/issues/156 (URL.build doesn't url encode credentials), and https://github.com/aio-libs/yarl/issues/143 (YARL does not support link-local ipv6 addresses).
- WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.
- IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.
- The backslash is never considered as a separator
WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.
Well, whatever we call them, it's a good specification of the behavior we get from the most common client implementations. Not sure how the numbers are on the server side, but there are many other libraries making these behaviors consistent across the board.
IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.
Sure. And I don't disagree. But yarl.URL
doesn't throw an exception, either. The RFC says it's not allow, you also agree it's not allowed, but it just gets parsed and assigned into the host
field and there are no errors reported.
The backslash is never considered as a separator
Well, it depends who you ask, right? Since it looks like on any major web browser, they get converted to SLASH, for some backwards compatibility reason.
From a server-side perspective, I understand that's it's not a favorable thing to support. But would actually allow using the library for areas that have browser-side effects.