hyperlink icon indicating copy to clipboard operation
hyperlink copied to clipboard

Non-numeric port parsing issue

Open kenballus opened this issue 1 year ago • 0 comments

The port number in the following URL is clearly malformed, but Hyperlink does this:

>>> hyperlink.URL.from_text("http://example.com: -໑_1\v").port
-11

This comes from the fact that ports are parsed with int. This leads to the following unintuitive consequences:

  • Whitespace, including all of (' ', '\t', '\v', '\r', '\n') (plus a bunch of unicode whitespace) will be stripped and from either side of the port number.
  • '-' or '+' can appear just before the first digit in the port number
  • '_' can appear between digits in the port number
  • Some unicode digits, such as '໑' can appear in port numbers All of this violates both the RFC and the WHATWG standard.

kenballus avatar Feb 07 '23 21:02 kenballus