Authentication data not handled
The following code:
#!/usr/bin/env python3
import requests_unixsocket
with requests_unixsocket.Session() as s:
print(s.get("http+unix://radicale@%2Frun%2Fmodoboa%2Fgunicorn.sock/api/o/introspect/"))
Results in the following SysCall:
connect(5, {sa_family=AF_UNIX, sun_path="radicale@/run/modoboa/gunicorn.sock"}, 38) = -1 ENOENT (No such file or directory)
Obviously the authentication info should no be part of the socket name, but rather be used for HTTP authentication after connecting.
Notably, passing Basic Auth data using the auth= parameter does work, so there is the workaround of using urllib3.util.parse_url to extract the authentication data yourself and pass it that way.
I'm not sure this can be qualified as a bug, though. Socket paths already aren't in the HTTP/URL specs. So there's no RFC to tell us that this should work. I think, it is fine to assume a relative file path in this case. Note that @ is often used to represent the first NUL-byte when printing abstract unix sockets.
Personally, I only use this lib in tests so can't say much about the ergonomics.
However, RFC 3986 specifically calls out this syntax in traditional HTTP URLs as deprecated: https://datatracker.ietf.org/doc/html/rfc3986#section-7.5 / https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1. And the reasoning makes sense to me too — things that appear in URLs tend to end up and archived in long-living logs.
@webknjaz: I don’t really think the rationale of that RFC applies to automated tooling, but I suppose it’s your call in the end. However, I would not say “it is fine to assume a relative file path in this case” – this has been highly confusing and hard to debug on top.
If this isn’t supported then it must be actively rejected (and same with the port) – the URI spec actually actually says this component will always exist (https://datatracker.ietf.org/doc/html/rfc3986#section-3.2), but notes that “Some schemes do not allow the userinfo and/or port subcomponents.” (Notably do not allow, rather than do not have.)
Both would work for me, but silently changing the meaning of the URI by interpreting the userinfo as host is essentially the “Semantic Attack” listed in https://datatracker.ietf.org/doc/html/rfc3986#section-7.6 but in actual processing, rather than “just” user display.
I think it does apply to automated tooling. It used to be a common practice with putting API tokens in there.
I can see your point and that @ would probably have to be urlencoded ideally. I think, we should reject it then.
P.S. I'm not invested enough to implement this. I only got a commit bit to be able to release compatibility fixes as the original author doesn't have maintenance capacity. I'm happy to review a straightforward PR if you'd like to work on one, though.
@webknjaz: I love when people say that openly! 😁
… We’ll see, I’ll probably do it but not immediately. 🤷