canonicalize_url incorrectly handles port when using hostname that requires IDNA encoding
Hello,
We just recently encountered the following problem:
canonicalize_url('https://тест.тест:33')
which returns
https://xn--e1aybc.xn--:33-qdd4dec/
while the expected value is
https://xn--e1aybc.xn--e1aybc:33/
And that happens to every hostname that required IDNA encoding for their TLD.
Could you please fix this behavior?
I also discovered one more related thing with multiple dots in the end of the domain:
>>> canonicalize_url('http://example.com.../тест')
'http://example.com.../%D1%82%D0%B5%D1%81%D1%82'
>>> canonicalize_url('http://тест.тест./тест')
'http://xn--e1aybc.xn--e1aybc./%D1%82%D0%B5%D1%81%D1%82'
>>> canonicalize_url('http://тест.тест.../тест')
'http://тест.тест.../%D1%82%D0%B5%D1%81%D1%82'
As you can see, single dot is handled properly, but with 2+ dots it doesn't encode the domain at all.
Update: it seems to be an invalid url according to the standard, so maybe the behavior is correct, though in other languages some url validators accept it and handle normally. So not sure if this addendum has to be fixed, I'll revert the title back.