requests icon indicating copy to clipboard operation
requests copied to clipboard

Still cannot prevent requests library from encoding

Open LudiusMaximus opened this issue 3 years ago • 13 comments

I have the same problem as https://github.com/psf/requests/issues/5964, but the solution does not work for me. I need to send unescaped URL requestes to a server. The purpose is to evoke response code 400 to test the firewall configuration.

Expected Result

The following code should (according to https://github.com/psf/requests/issues/5964) send the unescaped URL:

import requests
s = requests.Session()

# Using Burp Suite Proxy to examine request being sent.
s.proxies = { 
    "http"  : "http://127.0.0.1:8080",
    "https" : "http://127.0.0.1:8080",
}
# Do not verify certificate.
s.verify = False

# Solution from https://github.com/psf/requests/issues/5964
base_url = 'https://www.example.com/search'
query = '?date_range=2017-01-01|2017-03-01'
req = requests.Request('GET', base_url)
p = req.prepare()
p.url += query
resp = s.send(p)
print(resp.request.url)

Actual Result

The print(resp.request.url) prints https://www.example.com/search?date_range=2017-01-01|2017-03-01 as expected. But what is really transmitted is the escaped URL, https://www.example.com/search?date_range=2017-01-01%7C2017-03-01 as seen in this Burp Suite screenshot: image

Reproduction Steps

Use the python code above and a way of choice to capture the actual response being sent (e.g. Burp Suite proxy). You can also use curl to send an actually unescaped response:

curl "https://www.example.com/search?date_range=2017-01-01|2017-03-01" -x 127.0.0.1:8080 --insecure

image

System Information

$ python -m requests.help
{
  {
  "chardet": {
    "version": "3.0.4"
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.10"
  },
  "platform": {
    "release": "5.14.0-1034-oem",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.27.1"
  },
  "system_ssl": {
    "version": "1010106f"
  },
  "urllib3": {
    "version": "1.25.8"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": false
}

LudiusMaximus avatar Apr 29 '22 13:04 LudiusMaximus

Hmm, so this appears to actually be a breakage introduced by this PR in urllib3 1.26.0. Requests is still forwarding the unencoded URL but urllib3 now unilaterally enforces encoding on paths. The immediate fix would be to downgrade to urllib3 1.25.11 but that's not a great idea long term.

I don't know if there's a good way to fix this now given the amount of time it's been in place. We could potentially not override the provided url and only use the parsed version where needed for the scheme. This line would need to be reverted but I don't know if we've enforced the scheme will always be present at this point.

nateprewitt avatar Apr 30 '22 17:04 nateprewitt

@sethmlarson what are your thoughts here? urllib3 has become more strict with what it will emit which is generally a positive but has broken some portions of the PreparedRequests workflow over the last handful of years. Is this a use case urllib3 is willing to support?

nateprewitt avatar Apr 30 '22 17:04 nateprewitt

@nateprewitt Yes, we should be supporting this through HTTPConnectionPool.request(). I think we lost this behavior sometime but it'd be good to restore it in cases where it's safe to do so.

sethmlarson avatar Apr 30 '22 22:04 sethmlarson

This problem may be solved by hooking urllib3 like this:

import requests
import urllib3.util.url as urllib3_url


def hook_invalid_chars(component, allowed_chars):
    # handle url encode here, or do nothing
    return component

urllib3_url._encode_invalid_chars = hook_invalid_chars

s = requests.Session()

s.verify = False

base_url = 'http://127.0.0.1:8080'
query = '?date_range=2017-01-01|2017-03-01'
req = requests.Request('GET', base_url)
p = req.prepare()
p.url += query
resp = s.send(p)
print(resp.request.url)

图片

System Information

{
  "chardet": {
    "version": "3.0.4"
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": "2.8"
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.10"
  },
  "platform": {
    "release": "4.4.0-19041-Microsoft",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010106f",
    "version": "19.0.0"
  },
  "requests": {
    "version": "2.26.0"
  },
  "system_ssl": {
    "version": "1010106f"
  },
  "urllib3": {
    "version": "1.25.8"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": true
}

LyleMi avatar Jul 01 '22 10:07 LyleMi

@LudiusMaximus Hope you're doing well. Did you find a solution?

The solution provided by @nateprewitt on #5964 doesn't work for me either (I tested).

@Lukasa provides a prepared requests example as well on #1454 at this point in the discussion

Based on the contribution guidelines I put the details for my situation on Stack Overflow since whether this is a bug/limitation is a gray area.

In my case if the device on the other end was my web server, I'd fix it - it isn't so I'm looking for a way to interface with it.

I would greatly appreciate help toward a code solution to overcome this roadblock. Thank you!

https://stackoverflow.com/questions/77255960/python-requests-prevent-url-encoding

mjbear avatar Oct 09 '23 02:10 mjbear

hi, +1 on this being an issue. I tried LyleMi's solution and it did not work even though the url seems well formed.

abaruah117 avatar Oct 28 '23 05:10 abaruah117

hi, +1 on this being an issue. I tried LyleMi's solution and it did not work even though the url seems well formed.

@abaruah117 TLDR: You'll either need to override items within urllib3 or use the lower-level urllib directly (credit to Tomi on Stack Overflow). The first one is probably frowned upon and the latter with urllib could be painful (requires manually handling many aspects that requests handles for us).

(I haven't entirely chosen my path to overcome the URL roadblock I ran into here. My task to interact with an older system is on-hold until I have time.)

More Details:

It is my impression that based on minimized comments (including Lyle's) and the lack of an official solution that this issue is deemed a problem the requests project won't be fixing (as the code causing this behavior exists in urllib3). Ludius' example included a pipe and mine included curly braces -- both of those do not appear to show up as "reserved" characters in the RFCs.

You might consider updating your reply with an example of the URL and format you're having issues with.

I have a question over on Stack Overflow where a really helpful person (Tomi) decided to poke at the issue. The code forcing the URI encoding is within urllib3 and not directly in requests so it isn't something requests will resolve (my opinion).

In RFC 2396, pipe and curly braces (among some other symbols) are considered "unwise" to use in URLs. And RFC 3986 in section 2 does not show pipe or curly braces as reserved, but then again 2396 is obsoleted or superseded by 3986. I suppose more than anything those and RFCs that obsolete even those two intend to explicitly convey the allowed characters and leave the prohibited/forbidden ones for inference (exception being the reserved characters that are explicitly called out).

Edit: grammar

mjbear avatar Oct 31 '23 00:10 mjbear

+1 I totally understand following standards in MOST cases but there should be some way for a developer to override the default behavior. This is making things hard for us security folks.

superswan avatar Nov 02 '23 02:11 superswan

+1 I totally understand following standards in MOST cases but there should be some way for a developer to override the default behavior. This is making things hard for us security folks.

@superswan While it can be overridden the changes won't be in requests, but instead urllib3.

There are ugly hack solutions over on Stack Overflow. No warranties and could break unexpectedly. https://stackoverflow.com/questions/77255960/python-requests-prevent-url-encoding

My testing was via netcat (nc) for Stack Overflow post so hopefully this works when I get back to running it against that live legacy device.

I hope this helps you as it helped me.

mjbear avatar Nov 02 '23 23:11 mjbear

@mjbear thank you. I recognize that the issue originates from urllib3. I wanted to express my concerns since this appears to be one of the few places engaging in active discussion. I have opted to use urlopen() instead.

However, this issue also impacts the Requests library, resulting in the loss of some of its enhanced features such as error handling, header manipulation, and timeouts due to the workaround. In past versions it was possible to override this behavior from requests directly. This change has made it challenging to work with legacy code and manage use cases that fall outside of standard practices.

superswan avatar Nov 03 '23 01:11 superswan