requests icon indicating copy to clipboard operation
requests copied to clipboard

PreparedRequests can't bypass URL normalization when proxies are used

Open shelld3v opened this issue 1 year ago • 2 comments

Related to #5289, where akmalhisyam found a way to bypass URL normalization using PreparedRequests, however, the solution doesn't work when you have proxies provided.

Expected Result

This should be able to explicitly set the request URL without getting normalized (from /../something.txt to /something.txt)

url = "http://example.com/../something.txt"
s = requests.Session()
req = requests.Request(method='POST' ,url=url, headers=headers, data=data)
prep = req.prepare()
prep.url = url
r = s.send(prep, proxies={"http": "http://127.0.0.1"}, verify=False)

Actual Result

The code above doesn't work, this one works though:

url = "http://example.com/../something.txt"
s = requests.Session()
req = requests.Request(method='POST' ,url=url, headers=headers, data=data)
prep = req.prepare()
prep.url = url
r = s.send(prep, verify=False)

Reproduction Steps

Use the code in Expected Result and check your proxy request log, you will see it doesn't work

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "5.2.0"
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": "38.0.4"
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.4"
  },
  "platform": {
    "release": "4.4.0-19041-Microsoft",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "30000080",
    "version": "21.0.0"
  },
  "requests": {
    "version": "2.32.3"
  },
  "system_ssl": {
    "version": "30000030"
  },
  "urllib3": {
    "version": "2.0.4"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": true
}

shelld3v avatar Nov 18 '24 17:11 shelld3v

The issue arises because the requests library automatically normalizes URLs when preparing a request. This normalization process resolves paths like /../something.txt into /something.txt, which may not be the desired behavior when using a proxy.

A possible workaround is to use a lower-level library like httpx or manually craft the HTTP request using Python's built-in libraries (http.client or socket). This allows you to completely control the raw HTTP request without normalization.

import http.client

Proxy and target details

proxy_host = "127.0.0.1" proxy_port = 8080 url = "http://example.com/../something.txt"

Manually craft the request

conn = http.client.HTTPConnection(proxy_host, proxy_port) headers = { "Host": "example.com", "User-Agent": "Custom User-Agent", "Content-Type": "application/x-www-form-urlencoded", } conn.request("POST", url, body="data=example", headers=headers)

Get the response

response = conn.getresponse() print(response.status, response.reason) print(response.read().decode())

miral2525 avatar Jan 27 '25 05:01 miral2525

@shelld3v Prepared requests call the urllib3.util.url.parse_url function two times in the code when a proxy is used. I tried to pass a flag to the requests.Request class, however requests.Session().send calls the parse_url function while making a request. The best workaround would be adding this issue in the urllib3 repository. This parse_url function normalizes all URLs; I think they should support a flag.

safatjamil avatar Jul 08 '25 13:07 safatjamil