urllib3
urllib3 copied to clipboard
ProxyManager doesn't honor HTTPConnection.default_socket_options
ProxyManager doesn't honor HTTPConnection.default_socket_options. I'm attempting to enable TCP Keep Alive which I monitor via "netstat -tnope" on Linux. Only the third block below works when using a Proxy.
Without a proxy, default_socket_options enables TCP Keep Alive successfully from urllib3.connection import HTTPConnection import urllib3 HTTPConnection.default_socket_options = HTTPConnection.default_socket_options + [ (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1), ] http = urllib3.PoolManager() http.request('GET', url)
With a proxy, default_socket_options does not enable TCP Keep Alive (or any other socket option) from urllib3.connection import HTTPConnection import urllib3 HTTPConnection.default_socket_options = HTTPConnection.default_socket_options + [ (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1), ] http = urllib3.ProxyManager(proxy_url) http.request('GET', url)
With a proxy, ProxyManager constructor param "socket_options" does enable TCP Keep Alive from urllib3.connection import HTTPConnection import urllib3 http = urllib3.ProxyManager(proxy_url, socket_options=HTTPConnection.default_socket_options + [ (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) ]) http.request('GET', url)
This was observed while using the "requests" module to connect to a URL via a proxy that has a short idle timeout, and the common "HTTPConnection.default_socket_options" workaround did nothing. I couldn't find a way to pass "socket_options" to ProxyManager through "requests" methods.
urllib3 1.26.15
Is your proxy HTTP or HTTPS? If the proxy is HTTPS you'd need to set the socket options on HTTPSConnection
instead of HTTPConnection
.
No change when assigning to HTTPSConnection.default_socket_options.
Is it related to src/urllib3/connectionpool.py's "if self.proxy:" check, which states "We cannot know if the user has added default socket options, so we cannot replace the list." ?
I went digging for why we even disable Nagle for proxies, remembering that we recently did some work to reduce packet fragmentation for proxies, specifically the CONNECT to initiate a tunnel.
I followed this trail:
- Issue reported to us here: https://github.com/urllib3/urllib3/issues/1491
- Fixed in CPython
_tunnel()
implementation https://github.com/python/cpython/pull/24780, available in 3.10 and 3.9 - We explicitly decided to disable Nagle's for non-proxies here: https://github.com/urllib3/urllib3/pull/283
- Reference that httplib2 has it, and a quote from @shazow:
Thinking of it rationally, since the Nagle algorithm clumps together smaller packets into one send, it makes sense to keep it enabled for proxies, which would have all packets going to the same place, thus would have more benefit from Nagle.
- httplib's issue for Nagle's: https://code.google.com/archive/p/httplib2/issues/91 and https://code.google.com/archive/p/httplib2/issues/28
- Reference that httplib2 has it, and a quote from @shazow:
@shazow @sigmavirus24 did either of you have insight into this? I'm kinda wondering if we should be universally disabling Nagle's and trying to combine request headers into a single send call instead of slowing down small proxy requests.
So, to be clear, in general Nagle's algorithm is disabled for non-proxy traffic. The socket options are kind of confusing. I think we want the algorithm enabled though for proxies. Depending on the layer the proxy acts at, it may have behavior that isn't ideal if we break things up over more packets. Remember not all proxies act at the same layer of the network.
Also, I'm not certain that at the level urllib3 is operating we can guarantee that headers are all in one packet.
Also, we're still supporting 3.8, right? So the fix for cpython is only so helpful until we drop that. Given that not everyone might have that fix in 3.9 or 3.10 I don't think we can meaningfully do anything until the better behavior is guaranteed for all users
@sigmavirus24
Remember not all proxies act at the same layer of the network.
This is the piece that I was missing mentally, thank you! We should continue using Nagle for proxies.
Are the socket_options
set correctly from HTTPConnection.default_socket_options
?
I'm looking at the following code:
class HTTPConnection(_HTTPConnection):
# ...
#: Disable Nagle's algorithm by default.
#: ``[(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)]``
default_socket_options: typing.ClassVar[connection._TYPE_SOCKET_OPTIONS] = [
(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
]
# ...
def __init__(
self,
host: str,
port: int | None = None,
*,
timeout: _TYPE_TIMEOUT = _DEFAULT_TIMEOUT,
source_address: tuple[str, int] | None = None,
blocksize: int = 16384,
socket_options: None
| (connection._TYPE_SOCKET_OPTIONS) = default_socket_options,
proxy: Url | None = None,
proxy_config: ProxyConfig | None = None,
) -> None:
# ...
and if I do conn = HTTPConnection('httpbin.org', 80)
, the output between conn.socket_options
and conn.default_socket_options
always seems to differ even if I have HTTPConnection.default_socket_options
set to the keep alive configuration suggested by @jbrunette. It seems like socket_options
doesn't take the value of default_socket_options
here.
I might be missing something obvious though.
Setting the default options should have worked before this commit: https://github.com/urllib3/urllib3/commit/287052a16a59bcaba5772387de36fa9a49eb8378
Something like this should fix it:
def __init__(
self,
# ...
socket_options: None
| (connection._TYPE_SOCKET_OPTIONS) = HTTPConnection.default_socket_options,
# ...
) -> None:
# ...
@sethmlarson Thoughts?
I'd be happy to create a PR addressing this issue but would need some guidance.