urllib3 icon indicating copy to clipboard operation
urllib3 copied to clipboard

Differences between Windows and Linux when handling HTTPS requests through HTTP proxy

Open iyanmv opened this issue 6 years ago • 7 comments

Hi,

Let me explain the problem I was facing and how I end up here. I'll try to do a brief summary.

I work in a company that protects access to the Internet through a proxy that uses NTLMv2 for authentication. This is not a problem for Windows computers but it is a pain when working with GNU/Linux machines. Anyway, there is a great solution for that: CNTLM. It is possible to create a local proxy that is able to authenticate with the corporate proxy and then, by setting http_proxy, https_proxy and no_proxy env variables and configuring properly some specific tools that do not use this variables (apt, yum, git, docker, etc.), voilà, Internet for everyone! No problems so far with two exceptions: pip and conda.

For those struggling with the same issue, please have a look at this open related issue: Doesn't work behind proxy in corporate Windows network (NTLM). I was able to use pip with Linux machines by setting up a local Nexus repository and adding pypi as a proxy repo. Yes, Nexus is able to authenticate with the NTLM proxy just fine.

But, why do I think this is a urllib3 issue, too? Sure, some interesting feature to add would be allowing NTLM authentication (see https://github.com/urllib3/urllib3/issues/242), but this is not what I am asking here (sure there are more interesting things to implement before an old authentication method). The problem is when I noticed that pip and conda work just fine in Windows with CNTLM, but not in Linux. Same CNTLM, python and urllib3 versions. And the problem is that urllib3 does not work properly when doing https requests through a proxy in Linux. I will try to have a look at the code, but I write these issue just in case more familiar with urllib3 can help :smiley:

How to replicate:

  1. Install and configure CNTLM
  2. Create a virutalenv or conda env with python 3
  3. Install urllib3 with pip (I tried version 1.23)
  4. Execute the following in Windows and Linux:
import urllib3
proxy = urllib3.ProxyManager()
proxy.request('GET', <any http site>)
proxy.request('GET', <any https site>)

: A similar proxy can be simulated in GNU/Linux with Squid + Samba + NTLMv2 auth. Also, have a look at this comment: it is possible to set it up with Apache.

In Windows both requests works well. I just get the (expected) warning:

InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

On the other hand, Linux http requests works but https one fails:

HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))

It looks like packet is not properly managed by CNTLM. Of course, this is the same error I get when trying to use pip and conda or when using python requests module.

Any ideas? Do you know any OS dependent feature that may be causing this?

Thanks!

iyanmv avatar Sep 05 '18 08:09 iyanmv

Thanks for filing this issue. Unfortunately, a majority of the maintainers of urllib3 do not use proxies in our workflow and thus aren't familiar with the ecosystem or issues that are commonly faced by proxy users. It would be of great benefit to many if you could look into solving this issue in urllib3. :)

sethmlarson avatar Sep 05 '18 15:09 sethmlarson

I had exactly the same issue and found a hack. It has nothing to do with the OS.

The source of this issue lies in http.client. When an https request is made via an NTLM proxy, the function _tunnel() in the HTTPconnection class is called. The original code proceeds if the return code is 200. For NTLM proxy, the return code is 407 so the following code is called.

if code != http.HTTPStatus.OK:
    self.close()
    raise OSError("Tunnel connection failed: %d %s" % (code,message.strip()))

The key is that, failed or otherwise, the code only sends one request and then returns. Therefore, NTLM would work if you modify the code to perform the dances prior to the above block. This hack works on my work Windows machine. I don't think it's a "solution" but works in (and only in) this particular case.

YuMan-Tam avatar Oct 16 '18 15:10 YuMan-Tam

Thanks @YuMan-Tam! Unfortunately, I cannot test your solution, I'm not working with NTLM proxies anymore (thank god! :pray:)

iyanmv avatar Nov 30 '18 16:11 iyanmv

Hi all, @YuMan-Tam could you share the code snippet for the workaround? - I have been stuck at this problem for quite some weeks. Your help is deeply appreciated

joshuacheong avatar Aug 02 '19 09:08 joshuacheong

The relevant modification is commented with “Experimentation connections” and the three import modules sspi, base64, win32api. I think I only modified the function _tunnel.

It has been a while since I lasted worked on it so I did not remember the details. But, roughly, for https requests, part of the NTLM dances dropped. Hence, one needs to find a way to keep the connection alive by manually passing the details of the dance. I figured this out by using the chrome/firefox debug log to isolate all send and receive data – up until the error occurs. This work around is specific for windows, and I only tested on my work PC which uses windows 7. However, I believe the mechanism works in general.

Authentication information is abstracted away with the sspi and win32api module.

Relevant snippet for client.py:

import sspi
import base64
import win32api
    def _tunnel(self):
        connect_str = "CONNECT %s:%d HTTP/1.1\r\n" % (self._tunnel_host,
            self._tunnel_port)
        connect_bytes = connect_str.encode("ascii")
        self.send(connect_bytes)
        
        """ Experimentation for connections"""
        # Prepare authorization header for the new request.
        # Manually add scflags=0
        username = win32api.GetUserName()
        ca = sspi.ClientAuth("NTLM", auth_info=
                             (username, "", None), scflags=0)
        _, data = ca.authorize(None)
        auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
        
        self._tunnel_headers["Connection"] = "keep-alive"
        self._tunnel_headers["Proxy-Connection"] = "keep-alive"
        self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key
        for header, value in self._tunnel_headers.items():
            header_str = "%s: %s\r\n" % (header, value)
            header_bytes = header_str.encode("latin-1")
            self.send(header_bytes)
        self.send(b'\r\n')

        response = self.response_class(self.sock, method=self._method)
        (version, code, message) = response._read_status()
        while True:
            line = response.fp.readline(_MAXLINE + 1)
            if line.decode("utf-8").startswith("Proxy-Authenticate: NTLM "):
                challenge = line.decode("utf-8").replace("\r\n","")
                challenge = list(filter(lambda s: s.startswith("Proxy-Authenticate: NTLM "),challenge.split(",")))
                challenge = challenge[0].strip().split()[2]
                challenge = base64.b64decode(challenge)
                # Build response of challenge
                _, data = ca.authorize(challenge)
                auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
                self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key

            if len(line) > _MAXLINE:
                raise LineTooLong("header line")
            if not line:
                # for sites which EOF without sending a trailer
                break
            if line in (b'\r\n', b'\n', b''):
                break
            if self.debuglevel > 0:
                print('header:', line.decode())
                
        self.send(connect_bytes)
        for header, value in self._tunnel_headers.items():
            header_str = "%s: %s\r\n" % (header, value)
            header_bytes = header_str.encode("latin-1")
            self.send(header_bytes)
        self.send(b'\r\n')

        response = self.response_class(self.sock, method=self._method)
        (version, code, message) = response._read_status()

        if code != http.HTTPStatus.OK:
            self.close()
            raise OSError("Tunnel connection failed: %d %s" % (code,
                                                               message.strip()))
        while True:
            line = response.fp.readline(_MAXLINE + 1)
            if len(line) > _MAXLINE:
                raise LineTooLong("header line")
            if not line:
                # for sites which EOF without sending a trailer
                break
            if line in (b'\r\n', b'\n', b''):
                break
            if self.debuglevel > 0:
                print('header:', line.decode())  

YuMan-Tam avatar Aug 02 '19 12:08 YuMan-Tam

@YuMan-Tam your snippet worked well. I did this:

  • copied and reworked requests-ntlm library into requests-ntlm2 library
    • when requests-ntlm and/or urllib3 finally addresses this I can deprecate requests-ntlm2 and archive the repo
  • created requests_ntlm2.connection.VerifiedHTTPSConnection which inherit from urllib3.connection.VerifiedHTTPSConnection and I overridden its _tunnel() method to be like your snippet
  • created requests_ntlm2.adapters.HttpNtlmAdapter which is responsible of monkey-patching pool classes in urllib3.poolmanager AND sending ntlm credentials downstream.

repo is here: https://github.com/dopstar/requests-ntlm2

dopstar avatar Dec 30 '19 09:12 dopstar

@YuMan-Tam your snippet worked well. I did this:

  • copied and reworked requests-ntlm library into requests-ntlm2 library

    • when requests-ntlm and/or urllib3 finally addresses this I can deprecate requests-ntlm2 and archive the repo
  • created requests_ntlm2.connection.VerifiedHTTPSConnection which inherit from urllib3.connection.VerifiedHTTPSConnection and I overridden its _tunnel() method to be like your snippet

  • created requests_ntlm2.adapters.HttpNtlmAdapter which is responsible of monkey-patching pool classes in urllib3.poolmanager AND sending ntlm credentials downstream.

repo is here: https://github.com/dopstar/requests-ntlm2

Awesome. I hope I will have a chance to test this soon! Thank you again for your work!

YuMan-Tam avatar Jan 18 '20 20:01 YuMan-Tam

@YuMan-Tam Thanks for the excellent workaround. It worked perfectly in my case trying to NTLM authenticate with corporate proxy without exposing username:password in the code. Hope to see this addressed by urllib soon.

MAbdElRaouf avatar Nov 02 '22 21:11 MAbdElRaouf