yarl
yarl copied to clipboard
Handling of relative path URL
Describe the bug
According to rfc 8089 paths for file URIs are absolute. yarl.URL
does not complain about them but displays a wrong repr.
To Reproduce
- yarl 1.7.2
- Instantiate file URL with "relative" path (see below)
Expected behavior
yarl.URL('file:foo/bar')
should raise exception or produce instance with correct repr: assert repr(URL('file:foo/bar')) == 'file:foo/bar'
Logs/tracebacks
In [1]: from yarl import URL
In [2]: URL('file:foo/bar') == URL('file:///foo/bar')
Out[2]: False
In [3]: URL('file:///foo/bar')
Out[3]: URL('file:///foo/bar')
In [4]: URL('file:foo/bar')
Out[4]: URL('file:///foo/bar')
In [5]: URL('file:foo/bar').path
Out[5]: 'foo/bar'
In [6]: URL('file:///foo/bar').path
Out[6]: '/foo/bar'
### Python Version
```console
$ python --version
3.8.12
multidict Version
$ python -m pip show multidict
5.1.0
yarl Version
$ python -m pip show yarl
1.7.2
OS
NixOS
Additional context
No response
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
Agree with the second suggestion: assert str(URL('file:foo/bar')) == 'file:foo/bar'
.
Exception raising is too restrictive and not backward compatible.
URL('file:///foo/bar')
is absolute, isn't it?
>>> URL("file:///foo/bar").is_absolute()
False
python discussion about parsing relative path URL: https://bugs.python.org/issue40938
URL('file:///foo/bar')
is absolute, isn't it?
It is, but URL('file:foo/bar')
is not:
>>> URL("file:foo/bar")
URL('file:///foo/bar')
>>> URL("file:foo/bar").is_absolute()
False
The point is that even though it is not absolute, the representation of the object makes it look like it is.
@mjpieters I've tried to fix, check this: https://github.com/q0w/yarl/tree/relative-file, whats your thoughts?
This is really a urllib.parse()
bug:
>>> from urllib.parse import urlsplit
>>> urlsplit('file:foo/bar')
SplitResult(scheme='file', netloc='', path='foo/bar', query='', fragment='')
>>> urlsplit('file:foo/bar').geturl()
'file:///foo/bar'
>>> urlsplit('file:///foo/bar')
SplitResult(scheme='file', netloc='', path='/foo/bar', query='', fragment='')
>>> urlsplit('file:///foo/bar').geturl()
'file:///foo/bar'
This applies to all schemes in the urllib.parse.uses_netloc
list, not just file URLs:
>>> urlsplit('http:foo/bar').geturl()
'http:///foo/bar'
>>> urlsplit('custom:foo/bar').geturl()
'custom:foo/bar'
I'm not 100% certain this is fixable; having a netloc requires that the path is absolute.
The better solution is to not accept such URLs, basically. @asvetlov Can't we deprecate the 'feature'?
urllib.parse.urlunsplit()
has been adding back in the netloc
delimiters for 20 years now.
If this is something the yarl does want to address, then you should call urlunsplit()
with scheme=''
:
>>> urlunsplit(('', '', 'foo/bar', '', ''))
'foo/bar'
Do this only if the path doesn't start with /
and then manually prefix it with the scheme and :
.
Alternatively, start deprecation now, and detect problematic URLs by checking for schemes in urllib.parse.uses_netloc
when the path doesn't start with /
.