requests icon indicating copy to clipboard operation
requests copied to clipboard

urllib.request.proxy_bypass makes DNS request without using configured proxies when system has NO_PROXY configured at some level

Open GiorgioComitini opened this issue 3 years ago • 15 comments

When using Requests via a SOCKS5 proxy on one of the latest MacBook Pros (Apple Silicon), the DNS requests are not correctly passed to the proxy server even when using the "socks5h" scheme. Instead, the DNS resolution happens at the level of the client, causing a DNS leak.

This happens with various versions/combinations of Requests/PySocks/Python, the latter being the Apple Silicon versions obtained through pyenv. I have tested this using the local SOCKS5 proxy server provided by Tor.

In more detail, I tested the bug on various pyenv Python distributions, labeled 3.8 to 3.10 plus miniforge3, with Requests from v2.16.0 to v2.27.1, and PySocks from v1.5.7 to v1.7.1 (not every single version in between). In what follows, I'll use an example System Information output.

Also, I used Wireshark to monitor the DNS requests, and I used curl with the --proxy socks5h://127.0.0.1:9050 flag as a control, to make sure that the local Tor proxy was working properly and that the DNS leak is indeed specific to Python/Requests/PySocks.

I am not able to confirm whether the DNS leak also happens on Python versions obtained other than through pyenv. On an Intel MacBook using an Anaconda Python version and the latest Requests/PySocks, the leak does not occur.

Expected Result

When using the "socks5h" scheme, the DNS requests should be forwarded to the SOCKS proxy.

Actual Result

The DNS requests are sent from the client instead, causing a DNS leak.

Reproduction Steps

Install any Apple Silicon Python version from pyenv (see above for the versions I tested) and start Tor.

import requests

proxies = {
    "http": "socks5h://127.0.0.1:9050",
    "https": "socks5h://127.0.0.1:9050",
}

resp = requests.get("https://"+url, proxies=proxies)

Monitor the outgoing DNS requests using Wireshark. If they are sent via the outbound network interface to your pre-configured DNS server (as opposed to the local interface, destination IP 127.0.0.1, port 9050), a DNS leak is occurring.

System Information

$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.3"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.10.2"
  },
  "platform": {
    "release": "21.4.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.27.1"
  },
  "system_ssl": {
    "version": "101010df"
  },
  "urllib3": {
    "version": "1.26.8"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

GiorgioComitini avatar Mar 16 '22 00:03 GiorgioComitini

Hi @GiorgioComitini, thanks for bringing this to our attention. From what you've described above, I believe we can already tell this is an issue at the PySocks layer or potentially even CPython. Requests doesn't actually handle any of the socks5 workflow, it's offloaded to urllib3 in this module, which then in turn calls into the socks connection from pysocks.

I think the next step here would be to verify if PySocks actually works for this case, and work backwards from there. Presumably we could do something like the code below to verify:

import socks

s = socks.socksocket() # Same API as socket.socket in the standard lib
s.set_proxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050, rdns=True) 

s.connect(("www.somesite.com", 80))
s.sendall("GET / HTTP/1.1 ...")
print(s.recv(4096))

I don't have any of the above infrastructure set up currently or immediate access to an m1 mac. We can add this to the backlog, but it may be faster if you can use the above to check in your current setup.

nateprewitt avatar Mar 16 '22 00:03 nateprewitt

@nateprewitt From inspecting the code from urllib3 -> PySocks it appears that things are working as expected? Just like you it'd take a second for me to get a setup to verify the bytes sent over the wire.

@GiorgioComitini could you also try reproducing with only urllib3?

sethmlarson avatar Mar 16 '22 00:03 sethmlarson

Hey there, thank you for the quick reply. I tried both your suggestions, and no DNS leak appears to occur when using either socks or urllib3 directly.

GiorgioComitini avatar Mar 16 '22 00:03 GiorgioComitini

So that's pretty interesting if it's the same versions being used with Requests. The only socks specific code in Requests is here. We create a copy of urllib3's SOCKSProxyManager and then use that to service requests for any URL scheme starting with "socks". We pull the connection from that manager anytime a proxy is chosen.

The only other component that's unique to Requests would be our proxy resolution logic. We do consider environment variables, .netrc files, and system configurations when determining if a proxy specification is actually used. Is it possible something is configured on your Mac that's excluding use of the socks5h proxy or routing it to a different proxy?

nateprewitt avatar Mar 16 '22 01:03 nateprewitt

I can confirm that the versions should be the same (by looking at the urllib3.__file__'s called by the main python script and from Requests' adapters module). My environment (see below) does not seem to contain any variable related to proxies:


$ /usr/bin/env

PWD=/Users/giorgiocomitini/.pyenv/versions/3.10-base/lib/python3.10/site-packages/requests
PYENV_ROOT=/Users/giorgiocomitini/.pyenv
TERM_SESSION_ID=[xxx]
HOME=/Users/giorgiocomitini
CONDA_SHLVL=0
TMPDIR=[xxx]
SHELL=/opt/homebrew/bin/fish
CONDA_PYTHON_EXE=/Users/giorgiocomitini/.pyenv/versions/conda/bin/python
SHLVL=1
TERM_PROGRAM=Apple_Terminal
USER=giorgiocomitini
PYENV_SHELL=fish
TERM=xterm-256color
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.[xxx]/Listeners
PATH=/Users/giorgiocomitini/.pyenv/versions/conda/condabin:/Users/giorgiocomitini/.pyenv/shims:/Users/giorgiocomitini/.pyenv/bin:/Users/giorgiocomitini/.shims:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
XPC_SERVICE_NAME=0
XPC_FLAGS=0x0
TERM_PROGRAM_VERSION=444
CONDA_EXE=/Users/giorgiocomitini/.pyenv/versions/conda/bin/conda
LANG=it_IT.UTF-8
LOGNAME=giorgiocomitini
__CFBundleIdentifier=com.apple.Terminal

(I also ran the script with bash with the same results, so the problem is not related to the fish shell. Moreover, I checked the config files for both fish and bash, and there's nothing there related to proxies).

Are you able to point me to any configuration specifically considered by Requests, to make sure that nothing is interfering with the proxy resolution?

Other tests I did:

  • I used a custom SSH SOCKS proxy instead of the Tor one, and it still leaks (so the problem is not Tor-related)
  • I used a non-pyenv version of Python (namely, the homebrew-installed 3.10 Python version) with the latest Requests and PySocks, and it still leaks (so the problem is not pyenv-related)
  • As a control, I used both the Tor and the SSH proxy on Firefox (with remote DNS on), and it doesn't leak (so the problem is not system-wide, as already confirmed by using curl and bare urllib3)

Some other hints:

>> resp = requests.get(...)

>> resp.connection.proxy_manager
{'socks5h://127.0.0.1:9050': <urllib3.contrib.socks.SOCKSProxyManager at 0x111640730>}

>> resp.connection.proxy_manager['socks5h://127.0.0.1:9050'].proxy_url
'socks5h://127.0.0.1:9050'

so the scheme seems to have gone through to the SOCKSProxyManager object (though I don't really know how it works, so I might be mistaken).

Finally, in the utils module, I added a debug line to print out the resolved proxies:

def resolve_proxies(request, proxies, trust_env=True):
    proxies = proxies if proxies is not None else {}
    url = request.url
    scheme = urlparse(url).scheme
    no_proxy = proxies.get('no_proxy')
    new_proxies = proxies.copy()

    if trust_env and not should_bypass_proxies(url, no_proxy=no_proxy):
        environ_proxies = get_environ_proxies(url, no_proxy=no_proxy)

        proxy = environ_proxies.get(scheme, environ_proxies.get('all'))

        if proxy:
            new_proxies.setdefault(scheme, proxy)
    print(new_proxies) #DEBUG
    return new_proxies

and I get the correct proxies:

OrderedDict([('http', 'socks5h://127.0.0.1:9050'), ('https', 'socks5h://127.0.0.1:9050')])

GiorgioComitini avatar Mar 16 '22 18:03 GiorgioComitini

I suspect https://github.com/psf/requests/blob/79f60274f7e461b8fd2f579e741f748438d7eadb/requests/utils.py#L789 is our problem. All the way in urllib.request it calls gethostbyname https://github.com/python/cpython/blob/ba76f901923d80ad9b24bb1636aa751d55e0c768/Lib/urllib/request.py#L2594

sigmavirus24 avatar Mar 17 '22 18:03 sigmavirus24

The quickest way to confirm that would be to set trust_env to False and see if the behavior goes away.

nateprewitt avatar Mar 17 '22 18:03 nateprewitt

Setting trust_env to False indeed prevents the leak!

So, if it's not an environment variable (since no proxy variable is returned by /usr/bin/env), what could it be? I found no .netrc nor _netrc files in my system.

I don't understand if @sigmavirus24's suggestion is related to the env or not.

GiorgioComitini avatar Mar 17 '22 19:03 GiorgioComitini

It's related to how we check if you have the host you're trying to visit configured in the NO_PROXY setting. It's a standard library function that doesn't understand socks

sigmavirus24 avatar Mar 17 '22 20:03 sigmavirus24

Yes, that's it!

So, in my system settings for the proxies (Network->Advanced->Proxies) I have an "Ignore the proxy settings for the following hosts and domains" field, and the hosts and domains to be ignored are *.local and 169.254/16 (which by the way are precisely those reported in the comments in urllib.request). If I delete these, the DNS doesn't leak anymore. This is why my old Mac didn't leak: it's not that it's an Intel vs. Apple CPU, it's just that I did not have those exceptions on (I guess they are set by default depending on the MacOS version). This issue's title needs to be changed, then.

I see where's the problem. The standard library's urllib needs to know the host's IP in order to compare it with the system exceptions. If it does not know it, it issues a DNS request. This behavior may be consistent with an observation I had made previously, namely that a single DNS request is subject to the leak, as opposed to e.g. multiple requests that would need to be issued if DN resolution were to be performed completely on the client's side (think about redirections). The rest of the conversation happens on the proxy as it should be.

Shall I assume then that this is an actual bug, albeit related to how Requests interacts with urllib and MacOS?

GiorgioComitini avatar Mar 17 '22 21:03 GiorgioComitini

Shall I assume then that this is an actual bug, albeit related to how Requests interacts with urllib and MacOS?

There's absolutely nothing we can do to fix this short of completely re-implementing large portions of urllib.request.

This might be a "Known Issue" with PySocks+Requests (because I suspect there are similar settings on Linux and Windows) that we can document a user-based change for, but we will not accept the maintenance burden necessary to "fix" this.

sigmavirus24 avatar Mar 18 '22 14:03 sigmavirus24

I understand this, and I think documenting the issue is a good start.

Nonetheless, I believe that this might be a (quite severe) security issue that should still be addressed in some way. For future versions of Requests, would it be feasible - or even desirable, for general-purpose usage - to have the proxy resolution logic not look at the environment/system as soon as proxies are specified to Requests as arguments? After all, if I'm using proxies through Requests (as opposed to at the system level), it's because I want Requests - and not my system - to manage the proxies. Of course, I do not have a clear picture of the proxy resolution logic, so if this does not make sense to you I apologize.

GiorgioComitini avatar Mar 18 '22 15:03 GiorgioComitini

After all, if I'm using proxies through Requests (as opposed to at the system level), it's because I want Requests - and not my system - to manage the proxies

That's what trust_env=False is for. Unfortunately, many people expect to be able to override one specific value and merge the rest together which is why we perform the behaviour you've observed.

Some folks want the behaviour of having an environment with http_proxy and https_proxy and no_proxy and then to override in a single request requests.get(url, proxies={"http": ".."}) but to still use https_proxy and no_proxy. The best way to disable that is trust_env=False.

sigmavirus24 avatar Mar 19 '22 13:03 sigmavirus24

Got it, thanks. This is why I asked if it would even be desirable (to me, yes, but to others, it depends).

Do you confirm that the only way to set trust_env to False is either to change the code in https://github.com/psf/requests/blob/main/requests/sessions.py#L398 or to issue requests exclusively through sessions objects? E.g.

>> from requests import Session

>> session = Session()

>> session.trust_env=False

>> session.request(...)

I.e. that trust_env cannot be passed as a parameter to one of the API functions?

Anyways, thanks to you all guys for the useful answers.

GiorgioComitini avatar Mar 19 '22 13:03 GiorgioComitini

Hi @GiorgioComitini,

Apologies for the delay. Yes, this is intended to be a Session-wide setting rather than per request. Either you trust the host or you don't. In general with Requests, you should always be using your own Session for anything outside of basic testing/prototyping. If you truly need to toggle between trusting and not trusting the environment in the same code base, you might consider doing something like:

trusted_session = Session()
untrusted_session = Session()
untrusted_session.trust_env = False
[...]

It's probably worth taking a closer look to determine if that's really necessary though.

nateprewitt avatar Mar 28 '22 16:03 nateprewitt