urlpath URL-escaping semantics of path components in urlpath are unclear

URL-escaping semantics of path components in urlpath are unclear

Open egnor opened this issue 4 years ago • 0 comments

It sort-of seems like urlpath will URL-encode non-URL-safe characters in URLs:

>>> urlpath.URL('/with space')
URL('/with%20space')
>>> urlpath.URL('/with%percent')
URL('/with%25percent')

But, existing URL encoding is preserved, so I guess that was not a true escaping process?

>>> urlpath.URL('/with%20code')
URL('/with%20code')

Retrieving the path parts afterwards decodes the escapes:

>>> urlpath.URL('with%20code').parts
('with code',)

That means taking a part and then re-appending it with '/' doesn't round-trip:

>>> (urlpath.URL('') / urlpath.URL('with%2520code').parts[0]).parts[1]
'with code'

Using .with_name() does seem to escape fully:

>>> urlpath.URL('foo').with_name('with%20code')
URL('with%2520code')

I see no direct way to add/edit/replace path components in a way that would ensure escaping, which seems unfortunate. If nothing else, quoting/escaping behavior needs to be very well documented, since it can be important to security (and certainly to functionality!).

It seems like the idea is that paths as passed to the constructor or a '/' operator are expected to be pre-urlencoded (but if anything unsafe does show up, it will be escaped -- IMHO this should raise an exception instead?). But, path parts as exposed in .parts and modified by .with_name() and .with_suffix() are the "underlying" unescaped values. However, there's no way to set an unescaped path, except by using urllib.parse.quote() yourself?

Jul 20 '20 19:07 egnor

urlpath urlpath copied to clipboard

URL-escaping semantics of path components in urlpath are unclear

urlpath
urlpath copied to clipboard