urllib3
urllib3 copied to clipboard
Differences between Windows and Linux when handling HTTPS requests through HTTP proxy
Hi,
Let me explain the problem I was facing and how I end up here. I'll try to do a brief summary.
I work in a company that protects access to the Internet through a proxy that uses NTLMv2 for authentication. This is not a problem for Windows computers but it is a pain when working with GNU/Linux machines. Anyway, there is a great solution for that: CNTLM. It is possible to create a local proxy that is able to authenticate with the corporate proxy and then, by setting http_proxy
, https_proxy
and no_proxy
env variables and configuring properly some specific tools that do not use this variables (apt, yum, git, docker, etc.), voilà, Internet for everyone! No problems so far with two exceptions: pip
and conda
.
For those struggling with the same issue, please have a look at this open related issue: Doesn't work behind proxy in corporate Windows network (NTLM). I was able to use pip
with Linux machines by setting up a local Nexus repository and adding pypi as a proxy repo. Yes, Nexus is able to authenticate with the NTLM proxy just fine.
But, why do I think this is a urllib3
issue, too? Sure, some interesting feature to add would be allowing NTLM authentication (see https://github.com/urllib3/urllib3/issues/242), but this is not what I am asking here (sure there are more interesting things to implement before an old authentication method). The problem is when I noticed that pip
and conda
work just fine in Windows with CNTLM, but not in Linux. Same CNTLM, python and urllib3
versions. And the problem is that urllib3
does not work properly when doing https requests through a proxy in Linux. I will try to have a look at the code, but I write these issue just in case more familiar with urllib3
can help :smiley:
How to replicate:
- Install and configure CNTLM✤
- Create a
virutalenv
orconda env
with python 3 - Install
urllib3
with pip (I tried version 1.23) - Execute the following in Windows and Linux:
import urllib3
proxy = urllib3.ProxyManager()
proxy.request('GET', <any http site>)
proxy.request('GET', <any https site>)
✤: A similar proxy can be simulated in GNU/Linux with Squid + Samba + NTLMv2 auth. Also, have a look at this comment: it is possible to set it up with Apache.
In Windows both requests works well. I just get the (expected) warning:
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
On the other hand, Linux http requests works but https one fails:
HTTPSConnectionPool(host='
', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))
It looks like packet is not properly managed by CNTLM. Of course, this is the same error I get when trying to use pip
and conda
or when using python requests module.
Any ideas? Do you know any OS dependent feature that may be causing this?
Thanks!
Thanks for filing this issue. Unfortunately, a majority of the maintainers of urllib3 do not use proxies in our workflow and thus aren't familiar with the ecosystem or issues that are commonly faced by proxy users. It would be of great benefit to many if you could look into solving this issue in urllib3. :)
I had exactly the same issue and found a hack. It has nothing to do with the OS.
The source of this issue lies in http.client
. When an https
request is made via an NTLM proxy, the function _tunnel()
in the HTTPconnection
class is called. The original code proceeds if the return code is 200. For NTLM proxy, the return code is 407 so the following code is called.
if code != http.HTTPStatus.OK:
self.close()
raise OSError("Tunnel connection failed: %d %s" % (code,message.strip()))
The key is that, failed or otherwise, the code only sends one request and then returns. Therefore, NTLM would work if you modify the code to perform the dances prior to the above block. This hack works on my work Windows machine. I don't think it's a "solution" but works in (and only in) this particular case.
Thanks @YuMan-Tam! Unfortunately, I cannot test your solution, I'm not working with NTLM proxies anymore (thank god! :pray:)
Hi all, @YuMan-Tam could you share the code snippet for the workaround? - I have been stuck at this problem for quite some weeks. Your help is deeply appreciated
The relevant modification is commented with “Experimentation connections” and the three import modules sspi, base64, win32api. I think I only modified the function _tunnel.
It has been a while since I lasted worked on it so I did not remember the details. But, roughly, for https requests, part of the NTLM dances dropped. Hence, one needs to find a way to keep the connection alive by manually passing the details of the dance. I figured this out by using the chrome/firefox debug log to isolate all send and receive data – up until the error occurs. This work around is specific for windows, and I only tested on my work PC which uses windows 7. However, I believe the mechanism works in general.
Authentication information is abstracted away with the sspi and win32api module.
Relevant snippet for client.py
:
import sspi
import base64
import win32api
def _tunnel(self):
connect_str = "CONNECT %s:%d HTTP/1.1\r\n" % (self._tunnel_host,
self._tunnel_port)
connect_bytes = connect_str.encode("ascii")
self.send(connect_bytes)
""" Experimentation for connections"""
# Prepare authorization header for the new request.
# Manually add scflags=0
username = win32api.GetUserName()
ca = sspi.ClientAuth("NTLM", auth_info=
(username, "", None), scflags=0)
_, data = ca.authorize(None)
auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
self._tunnel_headers["Connection"] = "keep-alive"
self._tunnel_headers["Proxy-Connection"] = "keep-alive"
self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key
for header, value in self._tunnel_headers.items():
header_str = "%s: %s\r\n" % (header, value)
header_bytes = header_str.encode("latin-1")
self.send(header_bytes)
self.send(b'\r\n')
response = self.response_class(self.sock, method=self._method)
(version, code, message) = response._read_status()
while True:
line = response.fp.readline(_MAXLINE + 1)
if line.decode("utf-8").startswith("Proxy-Authenticate: NTLM "):
challenge = line.decode("utf-8").replace("\r\n","")
challenge = list(filter(lambda s: s.startswith("Proxy-Authenticate: NTLM "),challenge.split(",")))
challenge = challenge[0].strip().split()[2]
challenge = base64.b64decode(challenge)
# Build response of challenge
_, data = ca.authorize(challenge)
auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
# for sites which EOF without sending a trailer
break
if line in (b'\r\n', b'\n', b''):
break
if self.debuglevel > 0:
print('header:', line.decode())
self.send(connect_bytes)
for header, value in self._tunnel_headers.items():
header_str = "%s: %s\r\n" % (header, value)
header_bytes = header_str.encode("latin-1")
self.send(header_bytes)
self.send(b'\r\n')
response = self.response_class(self.sock, method=self._method)
(version, code, message) = response._read_status()
if code != http.HTTPStatus.OK:
self.close()
raise OSError("Tunnel connection failed: %d %s" % (code,
message.strip()))
while True:
line = response.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
# for sites which EOF without sending a trailer
break
if line in (b'\r\n', b'\n', b''):
break
if self.debuglevel > 0:
print('header:', line.decode())
@YuMan-Tam your snippet worked well. I did this:
- copied and reworked
requests-ntlm
library intorequests-ntlm2
library- when
requests-ntlm
and/orurllib3
finally addresses this I can deprecaterequests-ntlm2
and archive the repo
- when
- created
requests_ntlm2.connection.VerifiedHTTPSConnection
which inherit fromurllib3.connection.VerifiedHTTPSConnection
and I overridden its_tunnel()
method to be like your snippet - created
requests_ntlm2.adapters.HttpNtlmAdapter
which is responsible of monkey-patching pool classes inurllib3.poolmanager
AND sending ntlm credentials downstream.
repo is here: https://github.com/dopstar/requests-ntlm2
@YuMan-Tam your snippet worked well. I did this:
copied and reworked
requests-ntlm
library intorequests-ntlm2
library
- when
requests-ntlm
and/orurllib3
finally addresses this I can deprecaterequests-ntlm2
and archive the repocreated
requests_ntlm2.connection.VerifiedHTTPSConnection
which inherit fromurllib3.connection.VerifiedHTTPSConnection
and I overridden its_tunnel()
method to be like your snippetcreated
requests_ntlm2.adapters.HttpNtlmAdapter
which is responsible of monkey-patching pool classes inurllib3.poolmanager
AND sending ntlm credentials downstream.repo is here: https://github.com/dopstar/requests-ntlm2
Awesome. I hope I will have a chance to test this soon! Thank you again for your work!
@YuMan-Tam Thanks for the excellent workaround. It worked perfectly in my case trying to NTLM authenticate with corporate proxy without exposing username:password in the code. Hope to see this addressed by urllib soon.