yarl icon indicating copy to clipboard operation
yarl copied to clipboard

yarl.URL doesn't recognize BACKSLASH as host-path separator

Open behnam opened this issue 5 years ago • 3 comments

Having "https://google%2Ecom\.yahoo.com/" as URL, both Chrome and Firefox resolve it as "google.com" to be the domain, which is (most probably) what the URL spec is defining.

Right now, yarl doesn't recognize that:

In [8]: yarl.URL(r"https://google%2Ecom%2F.yahoo.com/").host
Out[8]: 'google%2ecom%2f.yahoo.com'

I think it's important to fix this, specially from a security perspective. What do you think?

behnam avatar Oct 15 '18 23:10 behnam

GitMate.io thinks possibly related issues are https://github.com/aio-libs/yarl/issues/242 (Handle path argument of URL.build, which doesn't start from /), https://github.com/aio-libs/yarl/issues/84 (Incorrect handling of '..' in url path), https://github.com/aio-libs/yarl/issues/185 (Allow joining URL and pathlib.Path), https://github.com/aio-libs/yarl/issues/156 (URL.build doesn't url encode credentials), and https://github.com/aio-libs/yarl/issues/143 (YARL does not support link-local ipv6 addresses).

aio-libs-bot avatar Oct 15 '18 23:10 aio-libs-bot

  1. WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.
  2. IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.
  3. The backslash is never considered as a separator

asvetlov avatar Oct 16 '18 08:10 asvetlov

WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.

Well, whatever we call them, it's a good specification of the behavior we get from the most common client implementations. Not sure how the numbers are on the server side, but there are many other libraries making these behaviors consistent across the board.

IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.

Sure. And I don't disagree. But yarl.URL doesn't throw an exception, either. The RFC says it's not allow, you also agree it's not allowed, but it just gets parsed and assigned into the host field and there are no errors reported.

The backslash is never considered as a separator

Well, it depends who you ask, right? Since it looks like on any major web browser, they get converted to SLASH, for some backwards compatibility reason.

From a server-side perspective, I understand that's it's not a favorable thing to support. But would actually allow using the library for areas that have browser-side effects.

behnam avatar Oct 16 '18 21:10 behnam