boltons
boltons copied to clipboard
boltons.urlutils.URL cannot handle some characters in credentials
When trying to parse a URL with special characters in the credentials part it is not working as expected, I found an issue with ?
and /
.
If it is in the password there is an exception
In [52]: URL("http://username:[email protected]:443")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in parse_url(url_text)
938 try:
--> 939 port = int(port_str)
940 except ValueError:
ValueError: invalid literal for int() with base 10: 'password'
During handling of the above exception, another exception occurred:
URLParseError Traceback (most recent call last)
<ipython-input-52-4078cba2692c> in <module>
----> 1 URL("http://username:[email protected]:443")
~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in __init__(self, url)
496 ' passing the result. (got: %s)'
497 % (DEFAULT_ENCODING, ude))
--> 498 ud = parse_url(url)
499
500 _e = u''
~/tests/venv/lib/python3.7/site-packages/boltons/urlutils.py in parse_url(url_text)
941 if port_str: # empty ports ok according to RFC 3986 6.2.3
942 raise URLParseError('expected integer for port, not %r'
--> 943 % port_str)
944 port = None
945
URLParseError: expected integer for port, not 'password'
If it is in the username it is not raising an exception but the pairing is incorrect
In [56]: print(URL("http://username?:[email protected]:443").port)
None
This seems to be an issue with urlparse
so I'm not sure it is boltons
to blame.
Trying to use a regex pattern is working for me
pattern = re.compile(
r"""
(?P<schema>[\w\+]+)://
(?:
(?P<username>[^:/]*)
(?::(?P<password>.*))?
@)?
(?:
(?:
(?P<host>[^/:]+)
)?
(?::(?P<port>[^/]*))?
)?
""",
re.X,
)
(This still not working if the username contain /
)