hyperlink
hyperlink copied to clipboard
Non-numeric port parsing issue
The port number in the following URL is clearly malformed, but Hyperlink does this:
>>> hyperlink.URL.from_text("http://example.com: -໑_1\v").port
-11
This comes from the fact that ports are parsed with int
. This leads to the following unintuitive consequences:
- Whitespace, including all of
(' ', '\t', '\v', '\r', '\n')
(plus a bunch of unicode whitespace) will be stripped and from either side of the port number. -
'-'
or'+'
can appear just before the first digit in the port number -
'_'
can appear between digits in the port number - Some unicode digits, such as
'໑'
can appear in port numbers All of this violates both the RFC and the WHATWG standard.