yarl icon indicating copy to clipboard operation
yarl copied to clipboard

Handling of relative path URL

Open chaoflow opened this issue 2 years ago • 7 comments

Describe the bug

According to rfc 8089 paths for file URIs are absolute. yarl.URL does not complain about them but displays a wrong repr.

To Reproduce

  1. yarl 1.7.2
  2. Instantiate file URL with "relative" path (see below)

Expected behavior

yarl.URL('file:foo/bar') should raise exception or produce instance with correct repr: assert repr(URL('file:foo/bar')) == 'file:foo/bar'

Logs/tracebacks

In [1]: from yarl import URL

In [2]: URL('file:foo/bar') == URL('file:///foo/bar')
Out[2]: False

In [3]: URL('file:///foo/bar')
Out[3]: URL('file:///foo/bar')

In [4]: URL('file:foo/bar')
Out[4]: URL('file:///foo/bar')

In [5]: URL('file:foo/bar').path
Out[5]: 'foo/bar'

In [6]: URL('file:///foo/bar').path
Out[6]: '/foo/bar'


### Python Version

```console
$ python --version
3.8.12

multidict Version

$ python -m pip show multidict
5.1.0

yarl Version

$ python -m pip show yarl
1.7.2

OS

NixOS

Additional context

No response

Code of Conduct

  • [X] I agree to follow the aio-libs Code of Conduct

chaoflow avatar Dec 10 '21 20:12 chaoflow

Agree with the second suggestion: assert str(URL('file:foo/bar')) == 'file:foo/bar'. Exception raising is too restrictive and not backward compatible.

asvetlov avatar Dec 13 '21 15:12 asvetlov

URL('file:///foo/bar') is absolute, isn't it?

>>> URL("file:///foo/bar").is_absolute()
False

python discussion about parsing relative path URL: https://bugs.python.org/issue40938

q0w avatar Jan 24 '22 07:01 q0w

URL('file:///foo/bar') is absolute, isn't it?

It is, but URL('file:foo/bar') is not:

>>> URL("file:foo/bar")
URL('file:///foo/bar')
>>> URL("file:foo/bar").is_absolute()
False

The point is that even though it is not absolute, the representation of the object makes it look like it is.

mjpieters avatar Mar 28 '22 16:03 mjpieters

@mjpieters I've tried to fix, check this: https://github.com/q0w/yarl/tree/relative-file, whats your thoughts?

q0w avatar Mar 29 '22 10:03 q0w

This is really a urllib.parse() bug:

>>> from urllib.parse import urlsplit
>>> urlsplit('file:foo/bar')
SplitResult(scheme='file', netloc='', path='foo/bar', query='', fragment='')
>>> urlsplit('file:foo/bar').geturl()
'file:///foo/bar'
>>> urlsplit('file:///foo/bar')
SplitResult(scheme='file', netloc='', path='/foo/bar', query='', fragment='')
>>> urlsplit('file:///foo/bar').geturl()
'file:///foo/bar'

This applies to all schemes in the urllib.parse.uses_netloc list, not just file URLs:

>>> urlsplit('http:foo/bar').geturl()
'http:///foo/bar'
>>> urlsplit('custom:foo/bar').geturl()
'custom:foo/bar'

I'm not 100% certain this is fixable; having a netloc requires that the path is absolute.

The better solution is to not accept such URLs, basically. @asvetlov Can't we deprecate the 'feature'?

mjpieters avatar Mar 29 '22 14:03 mjpieters

urllib.parse.urlunsplit() has been adding back in the netloc delimiters for 20 years now.

mjpieters avatar Mar 29 '22 14:03 mjpieters

If this is something the yarl does want to address, then you should call urlunsplit() with scheme='':

>>> urlunsplit(('', '', 'foo/bar', '', ''))
'foo/bar'

Do this only if the path doesn't start with / and then manually prefix it with the scheme and :.

Alternatively, start deprecation now, and detect problematic URLs by checking for schemes in urllib.parse.uses_netloc when the path doesn't start with /.

mjpieters avatar Mar 29 '22 15:03 mjpieters