requests-unixsocket icon indicating copy to clipboard operation
requests-unixsocket copied to clipboard

Alternative URL scheme: http://sock.local

Open msabramo opened this issue 8 years ago • 26 comments

Add a much friendlier URL scheme:

e.g.: http://sock.local/var/run/docker.sock/version

while still supporting the old scheme too (http+unix://%2Fvar%2Frun%2Fdocker.sock/info)

The new scheme is nice, because it:

  • uses the "standard" protocol of http:// and doesn't invent a funky new one
  • doesn't require ugly and error-prone URL encoding of the socket path (see https://github.com/httpie/httpie-unixsocket/issues/7, https://github.com/httpie/httpie-unixsocket/issues/9).
  • uses path part of URL instead of hostname for the socket path, thus avoiding issues with 63 character limit for hostnames in DNS/IDNA (e.g.: #23)

Example:

In [1]: import requests_unixsocket

In [2]: session = requests_unixsocket.Session()

In [3]: res = session.get('http://sock.local/var/run/docker.sock/version')

In [4]: res.json()
Out[4]:
{'ApiVersion': '1.24',
 'Arch': 'amd64',
 'BuildTime': 'Mon, 19 Dec 2016 09:20:48 +1300',
 'GitCommit': '6b644ec',
 'GoVersion': 'go1.6.2',
 'KernelVersion': '4.4.0-63-generic',
 'Os': 'linux',
 'Version': '1.12.3'}

Abstract namespace sockets (Linux only) are also supported:

In [12]: res = session.get('http://sock.local/%00test_socket/get')

In [13]: res.json()
Out[13]:
{'args': {},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Host': 'localhost',
  'User-Agent': 'python-requests/2.13.0'},
 'origin': '73.93.155.107',
 'url': 'http://localhost/get'}

Cc: @jkbrzt, @jessfraz, @RantyDave, @Kobla, @habnabit, @graingert, @3XE, @esben, @avian2, @wrouesnel, @tjj5036, @rockstar

msabramo avatar Feb 22 '17 05:02 msabramo

The problem I see with the new URL scheme is that it's ambiguous. Any other system would interpret http://sock.local/... as a normal HTTP resource on the sock mDNS host (https://tools.ietf.org/html/rfc6762) as would probably also any developer just looking at the URL. It's no longer a unique resource identifier.

To avoid problems with having a path in the hostname part, how about something like http+unix://localhost/var/run/docker.sock/version? It's longer, but avoids the confusion.

avian2 avatar Feb 22 '17 07:02 avian2

Agreed with avian. ".local" is the domain used to advertise with multicast DNS. Why not use the 'triple slash' method as used with file url's (i.e. file:///var/www/something.html) ... so http+unix:///var/run/docker.sock/version ?

Edit: Actually, I can't see anything wrong with http+file:///var/run/docker.sock either.

RantyDave avatar Feb 22 '17 07:02 RantyDave

http+x is deprecated in requests and will be dropped in the future. You need to register your own domain and use that:

https://github.com/msabramo/requests-unixsocket/pull/25/files

graingert avatar Feb 22 '17 12:02 graingert

For reference, I'm guessing @graingert is talking about these problems with http+x schemes in Requests. The issue seems to be still under discussion.

https://github.com/kennethreitz/requests/issues/3734 https://github.com/kennethreitz/requests/issues/3735

avian2 avatar Feb 22 '17 13:02 avian2

I vote for http+unix:///var/run/docker.sock/version

jkbrzt avatar Feb 22 '17 15:02 jkbrzt

Thanks folks, I didn't realize that .local was specifically for mDNS. I wonder if we could use .localhost? E.g.: http://sock.localhost/var/run/docker.sock/version ?

The name localhost is reserved by the Internet Engineering Task Force (IETF) in RFC 2606 (June 1999) as a domain name label that may not be installed as a top-level domain...This allows the use of these names for either documentation purposes or in local testing scenarios.

msabramo avatar Feb 22 '17 15:02 msabramo

NO! Don't use TLDs that you don't own!

On 22 Feb 2017 15:04, "Marc Abramowitz" [email protected] wrote:

Thanks folks, I didn't realize that .local was specifically for mDNS. I wonder if we could use .localhost https://en.wikipedia.org/wiki/.localhost?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msabramo/requests-unixsocket/pull/34#issuecomment-281694571, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTGOFcBex-QD4N8mzDoZ9CYaX6z1Oks5rfE5rgaJpZM4MINdm .

graingert avatar Feb 22 '17 15:02 graingert

http+unix:///var/run/docker.sock/version seems to be gathering some momentum. I had been excited about getting rid of the http+unix custom protocol, because it's funky and I thought some software might refuse to recognize these as URLs, but maybe it's not so bad. The main user of this is httpie-unixsocket and that will recognize this; in fact I might not even have to change it, which is kind of nice.

Is anyone using requests-unixsocket for things besides httpie-unixsocket? And if so, will the http+unix protocol impact it in any way? I am guessing not, but thought I'd make sure.

msabramo avatar Feb 22 '17 15:02 msabramo

You need to escape the slashes http+unix://%2Fvar%2Frun%2Fdocker. sock/version

On 22 Feb 2017 15:16, "Marc Abramowitz" [email protected] wrote:

http+unix:///var/run/docker.sock/version seems to be gathering some momentum. I had been excited about getting rid of the http+unix custom protocol, because it's funky and I thought some software might refuse to recognize these as URLs, but maybe it's not so bad. The main user of this is httpie-unixsocket https://github.com/httpie/httpie-unixsocket and that will recognize this; in fact I might not even have to change it, which is kind of nice.

Is anyone using requests-unixsocket for things besides httpie-unixsocket https://github.com/httpie/httpie-unixsocket? And if so, will the http+unix protocol impact it in any way. I am guessing not, but thought I'd make sure.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msabramo/requests-unixsocket/pull/34#issuecomment-281698422, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTPwXD_gnu8llFweOcZf5XmOqMq9Jks5rfFFZgaJpZM4MINdm .

graingert avatar Feb 22 '17 15:02 graingert

I wonder if we should run this by an HTTP and requests expert, by which I mean @Lukasa...

It would be nice to pick something that requests is likely to support in the future, and that won't cause them any trouble.

msabramo avatar Feb 22 '17 15:02 msabramo

@graingert: This new approach eliminates the need for URL encoding, which was the ugliest and most error-prone part of the old format. Note that URL starts with http+unix:/// (three slashes). This means that the hostname is blank and the path to the socket is in the path part of the URL, so slashes should be fine. See https://github.com/msabramo/requests-unixsocket/pull/34/files#diff-b425cc2e20f49bc2f6ea4538b259ea20R69 for how it works in the code.

Did I miss something?

msabramo avatar Feb 22 '17 15:02 msabramo

@msabramo the three slashes is a bug in how urlparse parsed URLs: https://////google.com is still valid

graingert avatar Feb 22 '17 15:02 graingert

@msabramo Hey, how's it going?

So, Requests has been dealing with this a bit recently, as @graingert noted. The relevant requests issues are: kennethreitz/requests#3734, kennethreitz/requests#3735, kennethreitz/requests#3713, shazow/urllib3#1080, and shazow/urllib3#1089. As you can see by the length of that list, which is not even entirely exhaustive, we've spent a lot of time trying to thread the needle of the kind of scheme overloading that is happening here.

The specific version of the problem, at least as Requests sees it, is this: Requests knows how the HTTP/HTTPS URL schemes work, and assumes it does not understand how anything else works. Requests has a very strong view of this, meaning that for a scheme it does not understand it won't do anything to the URL, including add the params parameters to it. This is intended to allow extremely different URLs such as file:// to get through unmolested.

However, there have been a few cases where users have wanted requests to treat the URL like it's HTTP, even though it isn't. In this case, due to a lazily-coded check in Requests, it has been possible to get this behaviour for schemes that begin with http and then go on to do something wacky. This causes us an enormous number of problems because users generally only want us to do a subset of the things we know how to do to URLs: the most recent examples are all related to unix sockets and indicate that those users want us to do query parameters, but not IDNA. Because we broke a bunch of people with our IDNA change, we reverted it in the 2.12 series.

We've committed to continue to support this behaviour in the 2.x release series. However, it has caused us an enormous amount of problems, because there is no consistency over what exactly users want. The only situation where the Requests team is safe to keep evolving our URL handling if is we touch only HTTP schemed URLs, so the only thing we're committing to for the future 3.0 release is to work with exactly http and https schemes only. This means that the 3.0 release of Requests may not work as you expect with a http+unix or similar scheme URL.

However, if we do that we'll be adding the ability for those that provide transport adapters to also provide hooks for URL handling, such that the logic currently done in prepare_url can be overridden in a scheme-appropriate way. This would allow requests-unixsocket to tell requests to treat a scheme in a specific way (including adding the params to it appropriately).

The TL;DR of all this is that any of the approaches here will work well. The approach of using http:// plus a dummy domain will work well in 3.0 as well, but http+whatever will potentially require more work for the 3.0 series.

Does that help answer your question @msabramo?

Lukasa avatar Feb 22 '17 15:02 Lukasa

@graingert The three slashes are very much on purpose. See https://www.ietf.org/rfc/rfc1738.txt section 3.10

One potential (but IMHO small) problem is that the 'triple slash' method doesn't delineate between the path to the socket and the path within the served namespace. I.e. is http:///var/socket/thing/a request to /var/socket for thing/a or is it a request to /var/socket/thing for /a? I assume that walking the tree stat'ing the nodes will make it clear which is the socket, however...

RantyDave avatar Feb 23 '17 01:02 RantyDave

https://tools.ietf.org/html/rfc1738#section-3.1 is less important than how it's implemented in browsers:

The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine.

https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

graingert avatar Feb 23 '17 01:02 graingert

@RantyDave @graingert What matters more is absolutely a matter of opinion. However, it is unquestionable that the WHATWG considers two or more slashes after the scheme equivalent to just two, and so if any Python project changes to using the WHATWG view of a URL then the "triple-slash" scheme will fall over in a heap.

Note also that this happens with Requests:

>>> r = requests.get('http+unix:///var/run/docker.sock/version')
requests.exceptions.InvalidURL: Invalid URL u'http+unix:///var/run/docker.sock/version': No host supplied

Specifically, if Requests thinks your scheme is a HTTP scheme then it wants a conforming URL, and there is no conforming URL that a user can provide that does not have a host part. So that's not going to fly.

Lukasa avatar Feb 23 '17 08:02 Lukasa

Oh, I wasn't looking to 'start something'. Whatever works the best and/or causes the least pain is fine by me :) It seems that in practice the http://something.mostly.valid/whatever scheme is pretty much a constraint leaving the choice of what 'something.mostly.valid' is. Perhaps the core of the issue is to identify a tld that will "never" used and run with that?

Disappointingly the "java" reverse notation can't be used because of . in filenames. I.e. http://docker.sock.run.var.unix-socket/version isn't a goer, but http://'docker.sock'.run.var.unix-socket/version might be. I have now run out of knowledge...

RantyDave avatar Feb 23 '17 09:02 RantyDave

something.mostly.valid is very easy to choose. It should be some domain name that you nominally own.

On 23 Feb 2017 09:13, "David Preece" [email protected] wrote:

Oh, I wasn't looking to 'start something'. Whatever works the best and/or causes the least pain is fine by me :) It seems that in practice the http://something.mostly.valid/whatever scheme is pretty much a constraint leaving the choice of what 'something.mostly.valid' is. Perhaps the core of the issue is to identify a tld that will "never" used and run with that?

Disappointingly the "java" reverse notation can't be used because of . in filenames. I.e. http://docker.sock.run.var.unix-socket/version isn't a goer, but http://'docker.sock'.run.var.unix-socket/version might be. I have now run out of knowledge...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msabramo/requests-unixsocket/pull/34#issuecomment-281937039, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTMTu3pRkQkZUkl3-HWyW0_cHPL_qks5rfU3KgaJpZM4MINdm .

graingert avatar Feb 23 '17 09:02 graingert

I am glad to see this is being thought about as obviously the whole 2F% thing is not ideal for readability or logging. Also, personally, it feels a bit weird mixing application layer and transport layer in the scheme (http+unix).

RFC2606 defines .invalid as a reserved TLD which is intended to never function:

  ".invalid" is intended for use in online construction of domain
  names that are sure to be invalid and which it is obvious at a
  glance are invalid.

Yes, the spec says "intended for use online" but at least we would know the domain would never be valid. So could do something like http://tld.invalid/var/run/docker.sock/version?

PeteDevoy avatar Feb 23 '17 14:02 PeteDevoy

Requests would be within its rights to reject tld.invalid

Just register a domain name and use that.

On 23 Feb 2017 14:58, "Pete" [email protected] wrote:

I am glad to see this is being thought about as obviously the whole 2F% thing is not ideal for readability and logging. Also, personally, it fields a bit weird mixing application layer and transport layer in the scheme ( http+unix).

RFC2606 https://tools.ietf.org/html/rfc2606#page-2 defines .invalid as a reserved TLD which can be used without fear of conflict:

".invalid" is intended for use in online construction of domain names that are sure to be invalid and which it is obvious at a glance are invalid.

Yes, the spec says "intended for use online" but at least we would know the domain would never be valid. So could be something like http://tld.invalid/var/run/docker.sock/version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msabramo/requests-unixsocket/pull/34#issuecomment-282013608, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTFwNB3ccfiEiUN0mHn3nqBCqXngyks5rfZ59gaJpZM4MINdm .

graingert avatar Feb 23 '17 15:02 graingert

@graingert Could that happen with the adapter/monkey patch in place?

PeteDevoy avatar Feb 23 '17 15:02 PeteDevoy

I mean you could monkey patch requests to accept it regardless. But the one sure fire way to get this to work, is use a domain name that you actually control.

graingert avatar Feb 23 '17 15:02 graingert

I was referring to the existing and proposed code -- I thought it may clean out the domain before requests sees it but, taking a closer look, I see it is not the case.

Sure, developers would be welcome to register their own domain names if they feel it is appropriate for their project but it seems reasonable to expect that the library would ship with a domain which is not valid, conflicting or confusing.

Requests would be within its rights to reject tld.invalid

On this point, I struggle to imagine that the requests library would start concerning itself with domain validation, maybe @Lukasa could comment?

PeteDevoy avatar Feb 23 '17 21:02 PeteDevoy

It's definitely extremely unlikely that we'd do it, but that isn't the same as saying it's not something we'd ever consider doing.

Lukasa avatar Feb 23 '17 21:02 Lukasa

This project could register a domain name unixsocket. or use unixsocket.msabramo.github.io.

Basically the same rules as Maven Central on Java package names

On 23 Feb 2017 21:39, "Cory Benfield" [email protected] wrote:

It's definitely extremely unlikely that we'd do it, but that isn't the same as saying it's not something we'd ever consider doing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msabramo/requests-unixsocket/pull/34#issuecomment-282129777, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTEU45egdf9pQeCqyoUyBUKXffQ2Qks5rffybgaJpZM4MINdm .

graingert avatar Feb 23 '17 21:02 graingert

There are differing opinions about how to structure the URLs. So I had the idea of making the URL parsing pluggable so that people can make it work however they see fit. The default is still the old, ugly format from before, so that I don't break code that's been written to use that. But now you can pass a settings object in and that can have a urlparse attribute that can be set to be a function that parses the URL and splits it into a sockpath and a reqpath.

See #41.

What do people think of that?

msabramo avatar Aug 15 '19 22:08 msabramo