urllib3 icon indicating copy to clipboard operation
urllib3 copied to clipboard

Any interest in an UnixHTTPConnectionPool?

Open mattrobenolt opened this issue 7 years ago • 21 comments

I'm happy to create a PR, but it seems like a pretty common use case to want to establish an HTTP connection over a unix socket.

I see this abstraction happening more commonly through requests, but was a little surprised that this doesn't just exist as a connection within urllib3.

The code I've thrown together (not yet battle tested) is pretty trivial to do when you get rid of the requests bits, and it's boiled down to just:

class UnixHTTPConnection(HTTPConnection):
    def __init__(self, host, **kwargs):
        # We're using the `host` as the socket path, but
        # urllib3 uses this host as the Host header by default.
        # If we send along the socket path as a Host header, this is
        # never what you want and would typically be malformed value.
        # So we fake this by sending along `localhost` by default as
        # other libraries do.
        self.socket_path = host
        super(UnixHTTPConnection, self).__init__(host='localhost', **kwargs)

    def _new_conn(self):
        sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        if self.timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
            sock.settimeout(self.timeout)
        sock.connect(self.socket_path)
        return sock


class UnixHTTPConnectionPool(HTTPConnectionPool):
    ConnectionCls = UnixHTTPConnection

If this is something you'd be interested in pulling into urllib3 as a core connection pool, I think this would be great to have, and I'll convert this into a PR. If not, would be curious if this seems outside the scope of urllib3 or some other reason.

The sources of this off the top of my head are within docker-py, which again, uses this sorta abstraction except a bit heavy handed since it also needs the requests adapter https://github.com/docker/docker-py/blob/master/docker/transport/unixconn.py. Then requests-unixsocket project https://github.com/msabramo/requests-unixsocket.

And again, nothing explicitly for those of us that like just directly using urllib3.

I plan on adding this into Sentry very soon, and will cut over to using this for some of our critical internal services to get it battle tested.

I also noticed that this is sorta being used within test suite: https://github.com/urllib3/urllib3/blob/0f85e05af9ef2ded671a7b47506dfd24b32decf0/test/test_connectionpool.py#L35-L45 I'm not entirely sure of the context here, but it's definitely just used for some mock class. Seems relevant though. :)

mattrobenolt avatar Oct 25 '18 00:10 mattrobenolt

I'm a -0 on this. This project isn't very actively maintained. I'd rather see something that has been shown to work well and already handle cases before adopting it into the core and even then, I'd like to see the original author maintain it which I didn't understand that as being the suggestion here.

The tests there were to ensure that people can write other connections and connection pools. This was just a convenient test case as I recall. People have FTP connections and connection pools. We could have just as easily used that as an example.

sigmavirus24 avatar Oct 25 '18 03:10 sigmavirus24

That's fine with me. Like I said, was just curious if there was interest in having support for this in core. :)

Thanks for the attention @sigmavirus24! <3

mattrobenolt avatar Oct 26 '18 20:10 mattrobenolt

I'm but one person :) I'd be interested to hear what @SethMichaelLarson and @theacodes think.

sigmavirus24 avatar Oct 27 '18 16:10 sigmavirus24

I'm actually okay with having this added to the library due to it's small surface. Not sure who'd use it but I'm sure urllib3 purists may have some interest.

sethmlarson avatar Oct 27 '18 17:10 sethmlarson

From my perspective, people are more frequently doing this by writing an adapter for requests, which, is obviously using urllib3. I personally prefer to just use urllib3 when I can.

Within Sentry, we try to leverage Unix sockets whenever we can for various reasons. This includes internal HTTP services. We directly use urllib3 for these situations, and I wanted to see how much work it’s be to just write a connection pool for Unix sockets. And turns out, it’s a trivial number of lines of code.

With that said, I think there’s a reasonable demand for http over Unix sockets in generally I think mostly due to popularity of Docker since by default, it’s bound to a Unix socket.

This leads us to projects such as docker-py as referenced, and other adapters for requests. And in my opinion, they are each duplicating efforts on the urllib3 side too since they aren’t just the requests adapter, they also include the connection pool needed for the adapter to use.

So in my hypothetical vision, we’d drop a unixconn package into urllib3.contrib. Then from there, docker-py and the other adapters people have, can just utilize this core bit that’s, theoretically then shared between everyone using this with requests.

This is at least my theory on why the demand might appear lower since others have tackled it at a different level.

(Also reopening since it seems we might have interest.)

mattrobenolt avatar Oct 27 '18 17:10 mattrobenolt

And lastly for reference, we are going to be shipping this and adopting it internally for a while in production.

https://github.com/getsentry/sentry/pull/10295

mattrobenolt avatar Oct 27 '18 17:10 mattrobenolt

Also worth noting that unix socket URIs have different semantics. Requests has had several issues over URL handling, much of which is rooted in urllib3's Url class so we probably want to switch to a different way of parsing URIs (there are a few options in the python-hyper org)

sigmavirus24 avatar Oct 27 '18 18:10 sigmavirus24

Maybe I’m missing some context here. But if I initialize the connection pool explicitly, where does URI parsing come into play?

The usage here is something like:

UnixHTTPConnectionPool("/var/run/something.sock").urlopen("GET", "/")

mattrobenolt avatar Oct 27 '18 18:10 mattrobenolt

Ah, if you're doing that instead of using a PoolManager (which is how I think most people use urllib3) then you're probably in the clear.

sigmavirus24 avatar Oct 29 '18 12:10 sigmavirus24

+1 to only implementing Connection and ConnectionPool, allows us to skip the URI handling for paths until later (if ever).

sethmlarson avatar Oct 29 '18 13:10 sethmlarson

Oh, I see. Yeah, if we wanted this, we'd have to invent some URI pattern like: unix:/var/run/docker.sock/foo or something. I have no real interest in doing this, and I don't think it makes a ton of sense here.

This being, working with PoolManager.

mattrobenolt avatar Oct 29 '18 14:10 mattrobenolt

The reason I was assuming the URI parsing problems was because I think main-stream support of a UnixHTTPConnection will mean that people assume they can use it from PoolManager. I suspect as soon as we release this that we'll get bug reports that PoolManager doesn't Just Work™ with it.

sigmavirus24 avatar Oct 29 '18 15:10 sigmavirus24

PoolManager only supports http and https. There are a few other connectionpool types under urllib3.contrib that all exist without going through PoolManager.

I don’t think this would be a barrier for adoption, since as I said, most are using requests anyways, and these underlying requests adapters would leverage the connectionpool directly and don’t use PoolManager.

mattrobenolt avatar Oct 29 '18 15:10 mattrobenolt

Either way, I think working with PoolManager would be nice, but agree that opens up a can of worms for URIs.

mattrobenolt avatar Oct 29 '18 15:10 mattrobenolt

Yeah. I think we just want to be explicit in the documentation (like a .. warning at the top) that this does not work with PoolManager. That should help deflect most issues.

sigmavirus24 avatar Oct 29 '18 15:10 sigmavirus24

If/when the SessionManager layer is complete (not sure whether that's on the roadmap anymore), which might someday supersede the PoolManager as the default high-level entry point, it would be a better fit for it anyways. :)

I don't think it semantically makes sense to pool unix sockets anyways, so PoolManager is probably the wrong place for it (except that redirect logic lives here for now--though it probably ultimately belongs at the Session layer). Even ConnectionPool is a weird layer for it but right now the RequestMethods are mixed in at that layer (perhaps that is the wrong layer though--or maybe it's just a misnamed layer).

Though if it's just a contrib thing, it could be a UnixHTTPConnection that happens to inherit from RequestMethods too.

shazow avatar Oct 29 '18 15:10 shazow

Re: PoolManager

If we did want to support some URI scheme, nginx seems to have a solution that works: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass

http://unix:/foo.sock:/some/path

It's not pretty, but it works. I tried to look up other examples, and things like curl don't accept a full URI, there's an explicit --unix-socket flag probably to also not deal with this.

I don't think it semantically makes sense to pool unix sockets anyways

Why do you say that? Would it make sense to reuse the same socket when possible? And also when dealing with threads, you'd need more than 1, so the pool abstraction still makes sense to me. But I'll admit, I'm not expert urllib3 user and maybe my understanding is wrong.

Though if it's just a contrib thing, it could be a UnixHTTPConnection that happens to inherit from RequestMethods too.

TIL, I can adapt that in a PR.

mattrobenolt avatar Oct 29 '18 17:10 mattrobenolt

Interesting that nginx does it that way. Requests seems to support http+unix as the scheme iirc (in so much as we support it so third-party adapters can support it). Maybe we should just pick a third syntax just because this entire concept isn't specified in any reliable way. (just a bit of snark)

sigmavirus24 avatar Oct 29 '18 17:10 sigmavirus24

+1 to inventing a new method. :)

mattrobenolt avatar Oct 29 '18 20:10 mattrobenolt

since requests is our primary downstream user, let's match their scheme. It's also a scheme that makes sense, nginx's just looks.. weird.

I'm fine with UnixHTTPConnection being in contrib.

theacodes avatar Oct 31 '18 06:10 theacodes

Cool. I’ll take the time after we ship this into production to put together a PR in here and try to figure out some tests.

I’ll focus on the connectionpool bits as we have it, then we can see about addressing the generic PoolManager and URI stuff.

Thanks everyone!

mattrobenolt avatar Oct 31 '18 15:10 mattrobenolt