uri icon indicating copy to clipboard operation
uri copied to clipboard

URI with empty path

Open Wouter1 opened this issue 4 years ago • 9 comments

From the wiki ​https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

If an authority component is present, then the path component must either be empty or begin with a slash (/).

So this should be correct uri https://localhost

(notice no slash at the end)

However

>>> URI("http://local")
URI('http://local/')

Wouter1 avatar Jun 16 '21 09:06 Wouter1

Indeed, path-empty is a BNF form not currently accounted for. 🤔

Looking at practical implications, the encoding (REPR, in this case) containing the slash appears to be a result of the attempt to form the hierarchical part by naively combining authority with path, and "resolving" that path. The path of a URI having no path is, in fact, distinct:

>>> URI('http://example.com')
<<< URI('http://example.com/')

>>> URI('http://example.com').path
<<< PurePosixPath('.')

Resolution of . results in / given no other base path. I agree that this is bug-worthy; a special case should be added for this "pathless" capability. (I.e. if the path is exactly ., there is no path.)

Conversely, the paths you have given as examples are the same URI after absolutizing. The first bullet point case below this section covers this situation, I believe.

amcgregor avatar Jun 16 '21 14:06 amcgregor

Yes after absolutizing it might be the same. I can not exactly predict if and when this difference would play up or how relevant this is for me. It played up in a bit artificial unit test here so I could work around it. But also I prefer to not wait for bug reports from my users.

Wouter1 avatar Jun 16 '21 14:06 Wouter1

Additional note: the relative reference resolution examples appear to strongly prefer that if a path has ever existed, that a root must be preserved:

"../.."         =  "http://a/"
"../../"        =  "http://a/"

amcgregor avatar Jun 16 '21 14:06 amcgregor

I would not draw conclusions about preferences from examples. Better stick with the specification

Wouter1 avatar Jun 16 '21 14:06 Wouter1

Examining the code, you can "hotfix" patch this behavior yourself if required:

from uri.part.path import PathPart

PathPart.empty = ""

That will swap out the default representation for an "empty" path globally:

>>> URI('http://example.com')
<<< URI('http://example.com')

>>> URI('http://example.com/foo')
<<< URI('http://example.com/foo')

I may have already thought about this, then forgotten. 😜 Edit to note: my desire has been to follow the specification as closely as is reasonable, while making sensible accomodations for user-expected behavior, i.e. as experienced in a browser user agent. In a majority of the browsers I utilize, if you omit the trailing / representing the path, it's added automatically since you actually are requesting the root of the domain. (path-empty is equivalent to / for practical reasons.)

Better stick with the specification.

Double check where these examples are literally coming from.

amcgregor avatar Jun 16 '21 14:06 amcgregor

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Double check where these examples are literally coming from

These are examples from the spec. So ?

Wouter1 avatar Jun 16 '21 14:06 Wouter1

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Nope, but that's a trivial one to try out:

>>> URI('http://example.com/')
<<< URI('http://example.com/')

The adaption is explicit, as mentioned, targeting the PurePosixPath('.') case of a URI omitting any path. Edit to be clear: / is not a path-empty.

amcgregor avatar Jun 16 '21 14:06 amcgregor

Ok good! Sounds like the way to go. Is this fix going to be released or put into the develop version as well? Because selling my users a "hot-fixed" build will raise more questions than selling them the develop version.

Wouter1 avatar Jun 16 '21 14:06 Wouter1

This will require discussion/debate; changing default behaviours is not often an easy decision. Which to target: pure specification, or practical application? The Zen of Python does have something to say about this, and alteration of behavior to match specific desires is demonstrably quite simple. Explicitly why I structured the representation into "parts" the way I have.

Consider it less of a "hotfix" than configuration, if you need to. This configuration also only alters behavior for your own application process; it merely isn't passed in as part of the constructor itself, it's process-global. (It's not like monkeypatching code paths—this is just the alteration of a global constant dedicated for this explicit purpose!)

amcgregor avatar Jun 16 '21 15:06 amcgregor