boltons icon indicating copy to clipboard operation
boltons copied to clipboard

boltons.urlutils.URL cannot handle some characters in credentials

Open arossert opened this issue 2 years ago • 0 comments

When trying to parse a URL with special characters in the credentials part it is not working as expected, I found an issue with ? and /.

If it is in the password there is an exception

In [52]: URL("http://username:[email protected]:443")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in parse_url(url_text)
    938             try:
--> 939                 port = int(port_str)
    940             except ValueError:

ValueError: invalid literal for int() with base 10: 'password'

During handling of the above exception, another exception occurred:

URLParseError                             Traceback (most recent call last)
<ipython-input-52-4078cba2692c> in <module>
----> 1 URL("http://username:[email protected]:443")

~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in __init__(self, url)
    496                                         ' passing the result. (got: %s)'
    497                                         % (DEFAULT_ENCODING, ude))
--> 498             ud = parse_url(url)
    499
    500         _e = u''

~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in parse_url(url_text)
    941                 if port_str:  # empty ports ok according to RFC 3986 6.2.3
    942                     raise URLParseError('expected integer for port, not %r'
--> 943                                         % port_str)
    944                 port = None
    945

URLParseError: expected integer for port, not 'password'

If it is in the username it is not raising an exception but the pairing is incorrect

In [56]: print(URL("http://username?:[email protected]:443").port)
None

This seems to be an issue with urlparse so I'm not sure it is boltons to blame.

Trying to use a regex pattern is working for me

pattern = re.compile(
r"""
    (?P<schema>[\w\+]+)://
    (?:
        (?P<username>[^:/]*)
        (?::(?P<password>.*))?
    @)?
    (?:
        (?:
            (?P<host>[^/:]+)
        )?
        (?::(?P<port>[^/]*))?
    )?
    """,
re.X,
)

(This still not working if the username contain /)

arossert avatar May 16 '22 13:05 arossert