httpx
httpx copied to clipboard
Keep digest authentication state, and reuse for following requests.
Checklist
- [ ] The bug is reproducible against the latest release and/or
master
. - [X] There are no similar issues or pull requests to fix it yet.
Describe the bug
DigestAuth will not re-use data from a previous challenge, and always start a new connection without authentication to get a new digest challenge. The requests library will re-use the authentication from first challenge.
https://tools.ietf.org/html/rfc2617 explicitly states that re-use is allowed.
To reproduce
Don't have a simple test case just yet.
Expected behavior
First request against server should be without authentication, then follow-up requests on same Client instance, should re-use authentication data from first.
Actual behavior
Every request will start without authentication headers, then continue with an authenticated requests.
Debugging material
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv/channelLists/all "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv/channelLists/all "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/applications "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/applications "HTTP/1.1 200 OK"
NOTE:
- Please list tracebacks in full (don't truncate them).
- If relevant, consider turning on DEBUG or TRACE logs for additional details (see https://www.python-httpx.org/environment_variables/#httpx_log_level).
- Consider using
<details>
to make tracebacks/logs collapsible if they're very large (see https://gist.github.com/ericclemmons/b146fe5da72ca1f706b2ef72a20ac39d). -->
Environment
- OS: macOs
- Python version: Python 3.9.1
- HTTPX version: 0.16.1
- Async environment: None
- HTTP proxy: No
- Custom certificates: No
Additional context
Hello!
You're right, RFC 2617 states that reuse of credentials is allowed:
https://tools.ietf.org/html/rfc2617#section-1.2
The protection space determines the domain over which credentials can be automatically applied. If a prior request has been authorized, the same credentials MAY be reused for all other requests within that protection space for a period of time determined by the authentication scheme, parameters, and/or user preference.
In this paragraph I understand the term "credentials" refers to the string we're putting as the Authorization: Digest <credentials>
header.
Requests does seem to skip WWW-Authenticate
if it has already authenticated:
https://github.com/psf/requests/blob/8c211a96cdbe9fe320d63d9e1ae15c5c07e179f8/requests/auth.py#L281-L283
But we don't, and instead always send a first unauthenticated request:
https://github.com/encode/httpx/blob/32d37cfdf1697f0dbacbc2233ee34606369be87b/httpx/_auth.py#L159-L160
Requests stores some local state (last_nonce
, nonce_count
, etc) to recreate the auth header, incrementing the client nonce each time. I assume if they're doing it then it's required, though we might be able to do things a bit more simply (eg just storing the entire challenge
object). Or not — we'd need to dig a bit.
For reference, there was discussion about this when we introduced Digest auth: #332 (split from #305, which also has a ton of context).
At the time we explicitly left nonce counting (ie reuse) out. I think it was a mix of "auth classes should not store state" (?) and "how would we store local state while supporting sync and async".
Marking this as an "enhancement", and happy for anyone to dig deeper. :-)
Reproduction
# example.py
import httpx
url = "https://httpbin.org/digest-auth/test/user/p@ssw0rd"
auth = httpx.DigestAuth("user", "p@ssw0rd")
with httpx.Client(auth=auth) as client:
# First request: 401 WWW-Authenticate then 200 OK.
r = client.get(url)
# Send new request, XXX: 401 WWW-Authenticate (again) then 200 OK.
r = client.get(url)
# Send new request, but reusing `authorization` header.
# This succeeds -- but should it?! Shouldn't the server expect the client to increase its nonce?
request = client.build_request("GET", url, headers={"authorization": r.request.headers["authorization"]})
r = client.send(request)
$ HTTPX_LOG_LEVEL=debug python example.py
DEBUG [2021-02-16 11:26:42] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 401 UNAUTHORIZED"
DEBUG [2021-02-16 11:26:42] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 401 UNAUTHORIZED"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"
I did a naive test that seem to work fine by storing state in the auth block:
class DigestAuthCached(httpx.DigestAuth):
_challenge = None
def auth_flow(self, request: httpx.Request) -> Generator[httpx.Request, httpx.Response, None]:
if self._challenge:
request.headers["Authorization"] = self._build_auth_header(request, self._challenge)
response = yield request
if response.status_code != 401 or "www-authenticate" not in response.headers:
# If the response is not a 401 then we don't
# need to build an authenticated request.
return
for auth_header in response.headers.get_list("www-authenticate"):
if auth_header.lower().startswith("digest "):
break
else:
# If the response does not include a 'WWW-Authenticate: Digest ...'
# header, then we don't need to build an authenticated request.
return
self._challenge = self._parse_challenge(request, response, auth_header)
request.headers["Authorization"] = self._build_auth_header(request, self._challenge)
yield request
Note. You need to remember the challenge, not just last authorization header since the header contains the requested URL.
Should be added that the domain of the challenge should likely be used as some cache key to know where the auth can be re-used.
Yup, you see, that's why we went the "don't worry, just re-auth each time for now" route. Lots of small things to think about to get this improvement in... Feasible, but not trivial. :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still valid
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still valid
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this is still valid.
Alrighty... here's how to tackle this one...
The DigestAuth
class needs two extra bits of state to persist on instances.
https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L153-L157
We need...
self._challenge: Optional[_DigestAuthChallenge] = None
self._nonce_count = 1
We should store the challenge as state on the instance, rather than just a local variable...
https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L175
Similarly with the nonce count, which we should track as state on the instance, and increment on each usage...
https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L225
Once we're tracking those two bits of state, then we're able to implement challenge reuse. If we've already got a challenge, then the auth flow can skip directly to _build_auth_header
...
https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L159-L177
We'll need a test case to cover this too.
Anyone want to give this a go? 😃
i came across this issue when noticing huge performance differences, i gave it a go at #2463
Not to mention that if you reauthenticate every request some servers will block you after 6-10 requests when using async.
Closed via #2463.