h icon indicating copy to clipboard operation
h copied to clipboard

URI normalization crashes when port not an int

Open seanh opened this issue 5 years ago • 2 comments

https://sentry.io/organizations/hypothesis/issues/978146793/?project=37293&referrer=github_plugin

ValueError: invalid literal for int() with base 10: 'chrome-extension'
(17 additional frame(s) were not displayed)
...
  File "h/storage.py", line 232, in expand_uri
    doc = models.Document.find_by_uris(session, [uri]).one_or_none()
  File "h/models/document.py", line 88, in find_by_uris
    query_uris = [uri_normalize(u) for u in uris]
  File "h/util/uri.py", line 179, in normalize
    netloc = _normalize_netloc(uri)
  File "h/util/uri.py", line 205, in _normalize_netloc
    port = uri.port
  File "urlparse.py", line 113, in port
    port = int(port, 10)

seanh avatar Apr 15 '19 15:04 seanh

@seanh I was looking into this issue. I am not sure of the exact uri that is being passed, but since the value error is thrown from the python library when there is a non-numeric string, is it safe to assume that if there is a ValueError for port, then it's not a correct port and hence assign port as None.

In that case, the possible solution I see is putting a simple try-except block. I am not sure if there are more intricacies involved where this approach would fail? But seems like this can only fail if there is a scenario where port should be a non-numeric string.

If the try-except block suffices, I am happy to send a PR for the same.

SaptakS avatar May 04 '19 03:05 SaptakS

Another possible scenario of a non-numeric port I can think of is something similar to this in which case I guess it might be useful to quote() all url strings before doing the urlsplit.

SaptakS avatar May 04 '19 03:05 SaptakS