yarl
yarl copied to clipboard
Implement RFC 2396 path segments
Currently there is no such thing as segments in yarl, which could be useful when you construct your path from a list and don't want to end up chain creating lots of URL objects with some hacky approach (see #138)
It should also be possible to do:
segments = ["foo", "bar/"] # with proper quoting here
url = URL("https://example.com") / segments
Would you prepare a pull request?
I can give it a shot. I assume you want immutability also on that part so I guess segments will be a tuple and a cached property. Any recommendation before I jump into it?
Are you talking about url.segments
property?
url.parts
already exists for it. I'm ok with supporting .segments
as an documented alias for .parts
if you want it really.
My initial thought about the issue was that you are talking about URL("https://example.com") / ["foo", "bar/"]
support. For me, it means that the right operand of __truediv__
method should accept typing.Union[str, typing.Sequence[str]]
(runtime check is required as well as updated type annotation).
OK I didn't know there was a parts
attribute. However RFC segments seems to not include the /
as a valid segment. I didn't even look for that name TBH. Or maybe I saw it and assumed it to be a list of all URL compoments, starting from the scheme till the very end.
Also I see there is a name
attribute that is the last of parts
. Couldn't find that in RFC. Maybe it's a convention? Could find evidence of that either.
>>> u = URL("https://example.com/foo/bar/baz.html")
>>> u.parts
('/', 'foo', 'bar', 'baz.html')
For reference, here is a "correct" implementation, minus immutability:
fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']
parts
and name
are modeled after pathlib
, I had no better idea at that moment.
Now the ship has sailed many years ago, .parts
property is settled in stone.
I hear you, segments
can have a little different behavior than parts
.
I agree that /
is not allowed segment name.
Regarding furl
design -- yes, I'm aware about the library.
yarl.URL
has no other public objects than URL
, let's keep this principle. So, instead of fu.path.segments
we can use just url.segments
.
Another question is the root segment. Should we explicitly distinguish it? I think yes. Instead of
fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']
I suggest the empty string for that (as pathlib
does):
url = yarl.URL("https://example.com/foo/bar/baz.html")
>>> url.segments
('', 'foo', 'bar', 'baz.html')
By this, we can handle /foo//bar
and /foo/bar/
as well.
What do you think?
If we go the single object route I would suggest the path_segments
name to make it more explicit. It's also the name in the RFC (not that it matters that much) but since we have no intermediate object I think this is more obvious this way.
As for the root segment, I don't see the usage of that. Is there a possibility that we have path segments and no root segment? Could you elaborate about the use cases?
A relative url has no root segment, e.g. blob:path/to
.
We use them for our custom schemas.
path_segments
is quite a long name. Please use just segments
. There are no other segments in URL than path parts.