url-normalize icon indicating copy to clipboard operation
url-normalize copied to clipboard

URL normalization for Python

Results 14 url-normalize issues
Sort by recently updated
recently updated
newest added

The correct answer for `http://www.foo.com/foo/bar.html/../bar.html` is `http://www.foo.com/bar.html`, but it returns `http://www.foo.com/foo/bar.html` instead, regarding `bar.html` as a folder.

Add idna package to dependencies. Remove python 2.7 from supported python versions.

Here's a sample Twitter search with a hashtag: `https://twitter.com/search?q=%23cncmachining&src=typed_query` When I run it through `url_normalization`, the encoded hash character (`%23`) is decoded into a hash (`#`), but it should stay...

Hi, I am showing some tests are falling with python-url-normalize-1.4.3. Investigating working in progress. Some hint will be appreciated. ```python :: Checking for conflicts... :: Checking for inner conflicts... [Repo...

Hi, nice normalizer, but in main example we see: ``` from url_normalize import url_normalize print(url_normalize('www.foo.com:80/foo')) https://www.foo.com/foo ``` But, wait a minute... HTTP port is 80 HTTPS port is 443 Normalize...

The behaviour of `url_normalize` is wrong when dealing with URLs that have encoded characters like question marks `?` or hash symbols `#` in the path. `%23` and `%3F` should not...

It's stripping url parameters `url = 'https://www.example.com/xx/path/slug-whatever?atag=1234de&utm_medium=affiliates&utm_source=whatever_5443de' print(url_normalize(url,sort_query_params=True))` https://www.example.com/xx/path/slug-whatever?atag=1234de&utm_medium=affiliates `print(url_normalize(url,sort_query_params=False))` https://www.example.com/xx/path/slug-whatever?atag=1234de&utm_medium=affiliates

This adds type hints to the entire tree. I'm not familiar with poetry, so not quite sure I got the dependencies right: with 3.6+, typing is in stdlib so no...

It should be possible to specify a `default_scheme` similar to the already possible `default_scheme`. E.g. ```python >>> url_normalize('/foo.png', default_scheme='https', default_domain='example.com') 'https://example.com/foo.png' ``` Currently that is not possible: ```python >>> url_normalize('/foo.png',...

As mentioned in the title: for some reason URLs starting with http: without slashes (which is a valid format if I am informed correctly) get normalized with a tripple slash...