yarl icon indicating copy to clipboard operation
yarl copied to clipboard

Implement RFC 2396 path segments

Open Diaoul opened this issue 3 years ago • 8 comments

Currently there is no such thing as segments in yarl, which could be useful when you construct your path from a list and don't want to end up chain creating lots of URL objects with some hacky approach (see #138)

It should also be possible to do:

segments = ["foo", "bar/"]  # with proper quoting here
url = URL("https://example.com") / segments

Diaoul avatar Jul 16 '20 21:07 Diaoul

Would you prepare a pull request?

asvetlov avatar Jul 19 '20 12:07 asvetlov

I can give it a shot. I assume you want immutability also on that part so I guess segments will be a tuple and a cached property. Any recommendation before I jump into it?

Diaoul avatar Jul 19 '20 12:07 Diaoul

Are you talking about url.segments property? url.parts already exists for it. I'm ok with supporting .segments as an documented alias for .parts if you want it really.

My initial thought about the issue was that you are talking about URL("https://example.com") / ["foo", "bar/"] support. For me, it means that the right operand of __truediv__ method should accept typing.Union[str, typing.Sequence[str]] (runtime check is required as well as updated type annotation).

asvetlov avatar Jul 19 '20 12:07 asvetlov

OK I didn't know there was a parts attribute. However RFC segments seems to not include the / as a valid segment. I didn't even look for that name TBH. Or maybe I saw it and assumed it to be a list of all URL compoments, starting from the scheme till the very end. Also I see there is a name attribute that is the last of parts. Couldn't find that in RFC. Maybe it's a convention? Could find evidence of that either.

>>> u = URL("https://example.com/foo/bar/baz.html")
>>> u.parts
('/', 'foo', 'bar', 'baz.html')

For reference, here is a "correct" implementation, minus immutability:

fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']

Diaoul avatar Jul 19 '20 13:07 Diaoul

parts and name are modeled after pathlib, I had no better idea at that moment. Now the ship has sailed many years ago, .parts property is settled in stone.

I hear you, segments can have a little different behavior than parts. I agree that / is not allowed segment name. Regarding furl design -- yes, I'm aware about the library. yarl.URL has no other public objects than URL, let's keep this principle. So, instead of fu.path.segments we can use just url.segments.

Another question is the root segment. Should we explicitly distinguish it? I think yes. Instead of

fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']

I suggest the empty string for that (as pathlib does):

url = yarl.URL("https://example.com/foo/bar/baz.html")
>>> url.segments
('', 'foo', 'bar', 'baz.html')

By this, we can handle /foo//bar and /foo/bar/ as well.

What do you think?

asvetlov avatar Jul 19 '20 13:07 asvetlov

If we go the single object route I would suggest the path_segments name to make it more explicit. It's also the name in the RFC (not that it matters that much) but since we have no intermediate object I think this is more obvious this way.

As for the root segment, I don't see the usage of that. Is there a possibility that we have path segments and no root segment? Could you elaborate about the use cases?

Diaoul avatar Jul 19 '20 13:07 Diaoul

A relative url has no root segment, e.g. blob:path/to. We use them for our custom schemas.

asvetlov avatar Jul 19 '20 14:07 asvetlov

path_segments is quite a long name. Please use just segments. There are no other segments in URL than path parts.

asvetlov avatar Jul 19 '20 14:07 asvetlov