yarl icon indicating copy to clipboard operation
yarl copied to clipboard

Port normalization

Open Julian opened this issue 2 years ago • 3 comments

Is your feature request related to a problem?

I'm planning on including yarl as a dependency in the referencing project, which essentially needs to maintain an object of type Mapping[URL, Foo].

As part of doing so, I want/need URLs to be normalized according to RFC3986 so that users looking up "equivalent" URLs get the same Foo.

This works perfectly for everything except port normalization. In other words, if a URL contains a default port, the following returns False: URL('http://example.com:80/default/port') == URL('http://example.com/default/port') even though these URLs are equivalent, and even though it seems yarl knows about some default ports.

This means that when doing lookups or insertions, I have to first manually do:

        if url.explicit_port and url.is_default_port():
            url = url.with_port(None)

Describe the solution you'd like

Strip default ports on creation, similar to what hyperlink.URL does, where:

>>> hyperlink.URL.from_text('http://example.com:80/default/port') == hyperlink.URL.from_text('http://example.com/default/port')

returns True.

Describe alternatives you've considered

Provide a .normalize method, though this seems odd given that all other normalization is done already for me (e.g. percentage escapes, case sensitivity or insensitivity, etc.)

Additional context

No response

Code of Conduct

  • [X] I agree to follow the aio-libs Code of Conduct

Julian avatar Aug 28 '23 07:08 Julian

Related issue that should also be fixed with this one: https://github.com/aio-libs/aiohttp/issues/7381 i.e. This should pass: URL("https://foo.com").origin() == URL("https://foo.com:443").origin()

Dreamsorcerer avatar Oct 02 '23 12:10 Dreamsorcerer

Had a look, I'd consider 2 scenarios.

Dropping the port in __eq__ allows keeping the url "as is" but does not really play nice with __lt__/__gt__. Having normalize_default_port=False as additional parameter to __new__()/build() allows dropping the port early on, __eq__ just works and it does not create confusion with __lt__/__gt__

What would you prefer?

commonism avatar Nov 24 '23 13:11 commonism

Dropping the port in __eq__ allows keeping the url "as is" but does not really play nice with __lt__/__gt__.

Well, I don't think you want to drop it (URL("https://foo.com:8000") != URL("https://foo.com:2020"), but do something more like replacing it with URL.port, which will include the default value, in order to compare them. Or something like that. Maybe the other comparison operators need to use that too.

Dreamsorcerer avatar Nov 24 '23 14:11 Dreamsorcerer