httpx
httpx copied to clipboard
Inconsistent Handling of Invalid Urls
If I call: httpx.get('https://😇')
, an InvalidURL
exception is raised. However, if I call: httpx.get('https:/google.com')
, instead I get an UnsupportedProtocol
exception, which seems inconsistent. I would expect the latter to raise an InvalidURL
as well.
For reference, requests raises a InvalidURL
in both cases.
This issue also exists for requests.get('https:///google.com')
vs httpx.get('https:///google.com')
.
Thanks @cancan101 - Appreciate the neatly summarised issue.
So...
Good place to start with something like this is to unpick it into the smallest possible components, in order to figure out exactly what behaviour we want, and that is consistent with the API throughout.
Here's where I got to after some first steps...
>>> u = httpx.URL('https://😇') # Raises `InvalidURL`.
And in contrast...
>>> u = httpx.URL('https:/google.com')
>>> u.scheme
'https'
>>> u.host
''
>>> u.path
'/google.com'
Now. That's not necessarily undesired behaviour at this point. We might consider the first to be a strictly invalid URL, and the second to be a valid URL, that just happens to be missing a hostname.
We can also confirm that URLs instantiated with explicit portions end up the same way here...
>>> u = httpx.URL(scheme='https', path='/google.com')
URL('https:/google.com')
So at this level of the API we are at least consistent.
I'm somewhat surprised tho, at why that results in an UnsupportedProtocol
when used to send a request. (Since it has a protocol, but is missing a host.)
Let's take a closer look...
>>> c = httpx.Client()
>>> r = c.build_request('GET', 'https:/google.com')
>>> r.url
URL('/google.com')
Hrm. Somethings changed here once we've started build_request
, what's that about exactly?
Haven't dug into it yet, not sure.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still valid thx, @stale bot.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@tomchristie
I think the issue may stem from checking whether the URL is absolute:
https://github.com/encode/httpx/blob/5b06aea1d64f0815af6fe71da3ac725bed3ec09f/httpx/_urls.py#L341-L350
In this case, since u.host
is ''
, it causes this check to return false and so it is treated as a relative URL.
This means that _merge_url
tries to prepend self.base_url
to the 'relative' URL, but since it's intended to be an absolute URL, there is no base_url
provided and so the scheme has now been removed. Since there is no longer a scheme, we receive UnsupportedProtocol
:
https://github.com/encode/httpx/blob/5b06aea1d64f0815af6fe71da3ac725bed3ec09f/httpx/_client.py#L369-L389
Is it sufficient to just check in is_absolute_url
if there is a scheme since in a relative URL it would be provided in base_url
?