rfc3986 icon indicating copy to clipboard operation
rfc3986 copied to clipboard

The first path segment is unexpectedly interpreted as an authority after normalization

Open lo48576 opened this issue 3 years ago • 0 comments

  • scheme:/..///bar has scheme="scheme", authority=None, path=/..///bar. However, after normalization, it has scheme="scheme", authority="bar".
  • Consider t1 as an IRI ..///bar resolved against scheme:. t1 should have scheme="scheme" and authority=None (since ..///bar does not contain authority). However, resulting string is scheme://bar, it has authority=bar.

And some more examples:

$ python
Python 3.9.9 (main, Jan 10 2022, 18:52:39)
[GCC 11.2.1 20211127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rfc3986 import uri_reference
>>> b = uri_reference('scheme:')
>>> r1 = uri_reference('..///bar')
>>> t1 = r1.resolve_with(b)
>>> t1
URIReference(scheme='scheme', authority=None, path='//bar', query=None, fragment=None)
>>> t1.unsplit()
'scheme://bar'
>>> r2 = uri_reference('/..///bar')
>>> r2.resolve_with(b)
URIReference(scheme='scheme', authority=None, path='//bar', query=None, fragment=None)
>>> uri_reference('scheme:/..///bar').normalize()
URIReference(scheme='scheme', authority=None, path='//bar', query=None, fragment=None)
>>> uri_reference('scheme:/..///bar').normalize().unsplit()
'scheme://bar'

I'm not sure how this should handled. Collapsing the // at the beginning is not explicitly allowed by RFC 3986, so I think the normalization and the resolution cannot produce valid output and should fail in this case. (But RFC 3986 does not seem to state that they can fail!)

This can caused by normalization during resolution, so #84 may also be affected by this issue.

lo48576 avatar Jan 10 '22 12:01 lo48576