httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Inconsistent Handling of Invalid Urls

Open cancan101 opened this issue 3 years ago • 6 comments

If I call: httpx.get('https://😇'), an InvalidURL exception is raised. However, if I call: httpx.get('https:/google.com'), instead I get an UnsupportedProtocol exception, which seems inconsistent. I would expect the latter to raise an InvalidURL as well.

For reference, requests raises a InvalidURL in both cases.

This issue also exists for requests.get('https:///google.com') vs httpx.get('https:///google.com').

cancan101 avatar Sep 01 '21 14:09 cancan101

Thanks @cancan101 - Appreciate the neatly summarised issue.

tomchristie avatar Sep 01 '21 14:09 tomchristie

So...

Good place to start with something like this is to unpick it into the smallest possible components, in order to figure out exactly what behaviour we want, and that is consistent with the API throughout.

Here's where I got to after some first steps...

>>> u = httpx.URL('https://😇')  # Raises `InvalidURL`.

And in contrast...

>>> u = httpx.URL('https:/google.com')
>>> u.scheme
'https'
>>> u.host
''
>>> u.path
'/google.com'

Now. That's not necessarily undesired behaviour at this point. We might consider the first to be a strictly invalid URL, and the second to be a valid URL, that just happens to be missing a hostname.

We can also confirm that URLs instantiated with explicit portions end up the same way here...

>>> u = httpx.URL(scheme='https', path='/google.com')
URL('https:/google.com')

So at this level of the API we are at least consistent.

I'm somewhat surprised tho, at why that results in an UnsupportedProtocol when used to send a request. (Since it has a protocol, but is missing a host.)

Let's take a closer look...

>>> c = httpx.Client()
>>> r = c.build_request('GET', 'https:/google.com')
>>> r.url
URL('/google.com')

Hrm. Somethings changed here once we've started build_request, what's that about exactly? Haven't dug into it yet, not sure.

tomchristie avatar Oct 20 '21 09:10 tomchristie

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 20 '22 15:02 stale[bot]

Still valid thx, @stale bot.

tomchristie avatar Feb 21 '22 13:02 tomchristie

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 25 '22 07:03 stale[bot]

@tomchristie I think the issue may stem from checking whether the URL is absolute: https://github.com/encode/httpx/blob/5b06aea1d64f0815af6fe71da3ac725bed3ec09f/httpx/_urls.py#L341-L350 In this case, since u.host is '', it causes this check to return false and so it is treated as a relative URL. This means that _merge_url tries to prepend self.base_url to the 'relative' URL, but since it's intended to be an absolute URL, there is no base_url provided and so the scheme has now been removed. Since there is no longer a scheme, we receive UnsupportedProtocol: https://github.com/encode/httpx/blob/5b06aea1d64f0815af6fe71da3ac725bed3ec09f/httpx/_client.py#L369-L389

Is it sufficient to just check in is_absolute_url if there is a scheme since in a relative URL it would be provided in base_url?

shelbylsmith avatar Jun 24 '22 17:06 shelbylsmith