h
h copied to clipboard
URI normalization crashes when port not an int
https://sentry.io/organizations/hypothesis/issues/978146793/?project=37293&referrer=github_plugin
ValueError: invalid literal for int() with base 10: 'chrome-extension'
(17 additional frame(s) were not displayed)
...
File "h/storage.py", line 232, in expand_uri
doc = models.Document.find_by_uris(session, [uri]).one_or_none()
File "h/models/document.py", line 88, in find_by_uris
query_uris = [uri_normalize(u) for u in uris]
File "h/util/uri.py", line 179, in normalize
netloc = _normalize_netloc(uri)
File "h/util/uri.py", line 205, in _normalize_netloc
port = uri.port
File "urlparse.py", line 113, in port
port = int(port, 10)
@seanh I was looking into this issue. I am not sure of the exact uri that is being passed, but since the value error is thrown from the python library when there is a non-numeric string, is it safe to assume that if there is a ValueError for port, then it's not a correct port and hence assign port as None
.
In that case, the possible solution I see is putting a simple try-except
block. I am not sure if there are more intricacies involved where this approach would fail? But seems like this can only fail if there is a scenario where port should be a non-numeric string.
If the try-except
block suffices, I am happy to send a PR for the same.
Another possible scenario of a non-numeric port I can think of is something similar to this in which case I guess it might be useful to quote()
all url strings before doing the urlsplit
.