requests
requests copied to clipboard
Incorrect behavior with schemeless-dotless host:port URLs
URLs of the form hostname:8080 (with no scheme, with "hostname" not containing any dots) can be used to refer to the netloc "hostname:8080"
requests.utils.prepend_scheme_if_needed should correctly prepend the new_scheme when provided with such a URL.
Expected Result
the prepended-scheme URL should be "http://hostname:8080"
Actual Result
the prepended-scheme URL is "hostname:///8080" (e.g. treating the "hostname" part as the scheme, no host, no port, and "8080" as the path)
I extended the test_prepend_scheme_if_needed to demonstrate this behavior (see https://github.com/psf/requests/compare/main...itamaro:requests:schemeless-hostname-anad-port-bug)
Reproduction Steps
from requests.utils import prepend_scheme_if_needed
print(prepend_scheme_if_needed("hostname:8080", "http"))
System Information
$ python -m requests.help
{
"chardet": {
"version": null
},
"charset_normalizer": {
"version": "3.1.0"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "3.4"
},
"implementation": {
"name": "CPython",
"version": "3.10.4"
},
"platform": {
"release": "22.4.0",
"system": "Darwin"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.30.0"
},
"system_ssl": {
"version": "101010ef"
},
"urllib3": {
"version": "2.0.2"
},
"using_charset_normalizer": true,
"using_pyopenssl": false
}
How does this realistically affect users who aren't using utils which aren't part of the public API?
How does this realistically affect users who aren't using utils which aren't part of the public API?
ah sorry, I went down a rabbit hole tracking down the issue to this that I forgot to mention the user-facing scenario!
one scenario is when using such a URL as a proxy:
response = requests.get(
"http://www.example.com",
proxies={"http": "myproxy:8080"},
....
)
this used to work (with requests 2.25.1 and python 3.8), but with requests 2.27.1 it fails with
...
requests.exceptions.InvalidProxyURL: Please check proxy URL. It is malformed and could be missing the host.
another scenario (less critical) is difference in exceptions (MissingSchema vs InvalidSchema):
response = requests.get("www.example.com")
...
MissingSchema: Invalid URL 'www.example.com': No scheme supplied. Perhaps you meant http://www.example.com?
vs
response = requests.get("hostname:8080")
...
requests.exceptions.InvalidSchema: No connection adapters were found for 'hostname:8080'
So the last two aren't ever supposed to work. Would it be nice if they raised the same exception? Sure. But never should we be guessing scheme. We never document that schemeless URLs are supported in that way.
The proxy case may take investigation. But I suspect we documented a breaking change there. I vaguely remember other people complaining
The proxy case may take investigation. But I suspect we documented a breaking change there. I vaguely remember other people complaining
Yes, it's definitely the proxy case that brought me here and is causing me issues. The other one is just a nit.
Hi @itamaro, this was an intentional change in 2.27.0 due to this bug in CPython. The behavior of urlparse fundamentally changed from Python 3.9 onwards which makes supporting schemeless URIs in this case fairly difficult. The decision was made to move to urllib3's url_parse function, which is the behavior you're seeing now to limit blast radius with the the standard library updates.
This is why you're seeing an error now that wasn't occurring previously.
Python 3.10
Python 3.10.5 (main, Jul 1 2022, 17:28:53) [Clang 13.0.0 (clang-1300.0.27.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> urlparse('hostname:8080')
ParseResult(scheme='hostname', netloc='', path='8080', params='', query='', fragment='')
Python 3.7
Python 3.7.9 (default, Aug 11 2022, 16:47:29)
[Clang 13.0.0 (clang-1300.0.27.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> urlparse('hostname:8080')
ParseResult(scheme='', netloc='', path='hostname:8080', params='', query='', fragment='')
>>>
The only reason this happened to work previously is we did this (arguably bad) shuffle of replacing the netloc with the path for this case. That honestly should have never been done but was there to work around the oddities of url parsing in the standard library.
So the short of the story is all version of Requests will be affected by this as soon as you upgrade beyond Python 3.8. We have no reliable way to control this from Requests side anymore. I would recommend updating your usage to ensure proxies are passed with a scheme to avoid any surprises later.
Requests 2.25.1 behavior on Python 3.10
>>> response = requests.get(
... "http://www.example.com",
... proxies={"http": "myproxy:8080"},
...)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/adapters.py", line 412, in send
conn = self.get_connection(request.url, proxies)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/adapters.py", line 309, in get_connection
proxy_manager = self.proxy_manager_for(proxy)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/requests/adapters.py", line 193, in proxy_manager_for
manager = self.proxy_manager[proxy] = proxy_from_url(
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/urllib3/poolmanager.py", line 492, in proxy_from_url
return ProxyManager(proxy_url=url, **kw)
File "/Users/nateprewitt/.pyenv/versions/3.10.5/lib/python3.10/site-packages/urllib3/poolmanager.py", line 429, in __init__
raise ProxySchemeUnknown(proxy.scheme)
urllib3.exceptions.ProxySchemeUnknown: Not supported proxy scheme myproxy
thanks for the background @nateprewitt !
I would recommend updating your usage to ensure proxies are passed with a scheme to avoid any surprises later.
totally agree this is the preferred solution, but it's not going to be easy doing that in our monorepo with millions of lines of code... (e.g. the entire Meta Python codebase 😬)
Guys should I try solving this or is this redundant?
@turingnixstyx I don't think there's anything to be solved. Requests cannot support schemeless proxies in Python 3.9+. It's an unfortunately tedious change for end-users, but we can't do much about the behavior in the standard library. What we're doing now is providing a consistent behavior across all Python versions.
Hey @nateprewitt actually really wanted to contribute to something (starting out as a young backend developer in python) . If there's any feature/bug I can work on?
@turingnixstyx, there's unfortunately nothing well curated for entry development on Requests at the moment. You may consider taking a look at https://github.com/urllib3/urllib3 or another Python project. Specifically, identifying issues that are labeled "Help Wanted", "Contributor Friendly" or "Good First Issue" is a good place to begin.
@nateprewitt appreciate your response