PreparedRequests can't bypass URL normalization when proxies are used
Related to #5289, where akmalhisyam found a way to bypass URL normalization using PreparedRequests, however, the solution doesn't work when you have proxies provided.
Expected Result
This should be able to explicitly set the request URL without getting normalized (from /../something.txt to /something.txt)
url = "http://example.com/../something.txt"
s = requests.Session()
req = requests.Request(method='POST' ,url=url, headers=headers, data=data)
prep = req.prepare()
prep.url = url
r = s.send(prep, proxies={"http": "http://127.0.0.1"}, verify=False)
Actual Result
The code above doesn't work, this one works though:
url = "http://example.com/../something.txt"
s = requests.Session()
req = requests.Request(method='POST' ,url=url, headers=headers, data=data)
prep = req.prepare()
prep.url = url
r = s.send(prep, verify=False)
Reproduction Steps
Use the code in Expected Result and check your proxy request log, you will see it doesn't work
System Information
$ python -m requests.help
{
"chardet": {
"version": "5.2.0"
},
"charset_normalizer": {
"version": "2.0.12"
},
"cryptography": {
"version": "38.0.4"
},
"idna": {
"version": "3.4"
},
"implementation": {
"name": "CPython",
"version": "3.11.4"
},
"platform": {
"release": "4.4.0-19041-Microsoft",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "30000080",
"version": "21.0.0"
},
"requests": {
"version": "2.32.3"
},
"system_ssl": {
"version": "30000030"
},
"urllib3": {
"version": "2.0.4"
},
"using_charset_normalizer": false,
"using_pyopenssl": true
}
The issue arises because the requests library automatically normalizes URLs when preparing a request. This normalization process resolves paths like /../something.txt into /something.txt, which may not be the desired behavior when using a proxy.
A possible workaround is to use a lower-level library like httpx or manually craft the HTTP request using Python's built-in libraries (http.client or socket). This allows you to completely control the raw HTTP request without normalization.
import http.client
Proxy and target details
proxy_host = "127.0.0.1" proxy_port = 8080 url = "http://example.com/../something.txt"
Manually craft the request
conn = http.client.HTTPConnection(proxy_host, proxy_port) headers = { "Host": "example.com", "User-Agent": "Custom User-Agent", "Content-Type": "application/x-www-form-urlencoded", } conn.request("POST", url, body="data=example", headers=headers)
Get the response
response = conn.getresponse() print(response.status, response.reason) print(response.read().decode())
@shelld3v Prepared requests call the urllib3.util.url.parse_url function two times in the code when a proxy is used. I tried to pass a flag to the requests.Request class, however requests.Session().send calls the parse_url function while making a request. The best workaround would be adding this issue in the urllib3 repository. This parse_url function normalizes all URLs; I think they should support a flag.