httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Keep digest authentication state, and reuse for following requests.

Open elupus opened this issue 3 years ago • 8 comments

Checklist

  • [ ] The bug is reproducible against the latest release and/or master.
  • [X] There are no similar issues or pull requests to fix it yet.

Describe the bug

DigestAuth will not re-use data from a previous challenge, and always start a new connection without authentication to get a new digest challenge. The requests library will re-use the authentication from first challenge.

https://tools.ietf.org/html/rfc2617 explicitly states that re-use is allowed.

To reproduce

Don't have a simple test case just yet.

Expected behavior

First request against server should be without authentication, then follow-up requests on same Client instance, should re-use authentication data from first.

Actual behavior

Every request will start without authentication headers, then continue with an authenticated requests.

Debugging material

DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv/channelLists/all "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/channeldb/tv/channelLists/all "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/applications "HTTP/1.1 401 Unauthorized"
DEBUG:httpx._client:HTTP Request: GET https://tv2.local:1926/6/applications "HTTP/1.1 200 OK"

NOTE:

  • Please list tracebacks in full (don't truncate them).
  • If relevant, consider turning on DEBUG or TRACE logs for additional details (see https://www.python-httpx.org/environment_variables/#httpx_log_level).
  • Consider using <details> to make tracebacks/logs collapsible if they're very large (see https://gist.github.com/ericclemmons/b146fe5da72ca1f706b2ef72a20ac39d). -->

Environment

  • OS: macOs
  • Python version: Python 3.9.1
  • HTTPX version: 0.16.1
  • Async environment: None
  • HTTP proxy: No
  • Custom certificates: No

Additional context

elupus avatar Feb 15 '21 23:02 elupus

Hello!

You're right, RFC 2617 states that reuse of credentials is allowed:

https://tools.ietf.org/html/rfc2617#section-1.2

The protection space determines the domain over which credentials can be automatically applied. If a prior request has been authorized, the same credentials MAY be reused for all other requests within that protection space for a period of time determined by the authentication scheme, parameters, and/or user preference.

In this paragraph I understand the term "credentials" refers to the string we're putting as the Authorization: Digest <credentials> header.

Requests does seem to skip WWW-Authenticate if it has already authenticated:

https://github.com/psf/requests/blob/8c211a96cdbe9fe320d63d9e1ae15c5c07e179f8/requests/auth.py#L281-L283

But we don't, and instead always send a first unauthenticated request:

https://github.com/encode/httpx/blob/32d37cfdf1697f0dbacbc2233ee34606369be87b/httpx/_auth.py#L159-L160

Requests stores some local state (last_nonce, nonce_count, etc) to recreate the auth header, incrementing the client nonce each time. I assume if they're doing it then it's required, though we might be able to do things a bit more simply (eg just storing the entire challenge object). Or not — we'd need to dig a bit.

For reference, there was discussion about this when we introduced Digest auth: #332 (split from #305, which also has a ton of context).

At the time we explicitly left nonce counting (ie reuse) out. I think it was a mix of "auth classes should not store state" (?) and "how would we store local state while supporting sync and async".

Marking this as an "enhancement", and happy for anyone to dig deeper. :-)

Reproduction

# example.py
import httpx

url = "https://httpbin.org/digest-auth/test/user/p@ssw0rd"
auth = httpx.DigestAuth("user", "p@ssw0rd")

with httpx.Client(auth=auth) as client:
    # First request: 401 WWW-Authenticate then 200 OK.
    r = client.get(url)

    # Send new request, XXX: 401 WWW-Authenticate (again) then 200 OK.
    r = client.get(url)

    # Send new request, but reusing `authorization` header.
    # This succeeds -- but should it?! Shouldn't the server expect the client to increase its nonce?
    request = client.build_request("GET", url, headers={"authorization": r.request.headers["authorization"]})
    r = client.send(request)

$ HTTPX_LOG_LEVEL=debug python example.py
DEBUG [2021-02-16 11:26:42] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 401 UNAUTHORIZED"
DEBUG [2021-02-16 11:26:42] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 401 UNAUTHORIZED"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"
DEBUG [2021-02-16 11:26:43] httpx._client - HTTP Request: GET https://httpbin.org/digest-auth/test/user/p@ssw0rd "HTTP/1.1 200 OK"

florimondmanca avatar Feb 16 '21 10:02 florimondmanca

I did a naive test that seem to work fine by storing state in the auth block:

class DigestAuthCached(httpx.DigestAuth):
    _challenge = None

    def auth_flow(self, request: httpx.Request) -> Generator[httpx.Request, httpx.Response, None]:
        if self._challenge:
            request.headers["Authorization"] = self._build_auth_header(request, self._challenge)

        response = yield request

        if response.status_code != 401 or "www-authenticate" not in response.headers:
            # If the response is not a 401 then we don't
            # need to build an authenticated request.
            return

        for auth_header in response.headers.get_list("www-authenticate"):
            if auth_header.lower().startswith("digest "):
                break
        else:
            # If the response does not include a 'WWW-Authenticate: Digest ...'
            # header, then we don't need to build an authenticated request.
            return

        self._challenge = self._parse_challenge(request, response, auth_header)
        request.headers["Authorization"] = self._build_auth_header(request, self._challenge)
        yield request

Note. You need to remember the challenge, not just last authorization header since the header contains the requested URL.

elupus avatar Feb 16 '21 18:02 elupus

Should be added that the domain of the challenge should likely be used as some cache key to know where the auth can be re-used.

elupus avatar Feb 16 '21 18:02 elupus

Yup, you see, that's why we went the "don't worry, just re-auth each time for now" route. Lots of small things to think about to get this improvement in... Feasible, but not trivial. :)

florimondmanca avatar Feb 16 '21 18:02 florimondmanca

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 20 '22 15:02 stale[bot]

Still valid

elupus avatar Feb 20 '22 19:02 elupus

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 25 '22 07:03 stale[bot]

Still valid

elupus avatar Mar 25 '22 07:03 elupus

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 15 '22 19:10 stale[bot]

I think this is still valid.

elupus avatar Oct 15 '22 23:10 elupus

Alrighty... here's how to tackle this one...

The DigestAuth class needs two extra bits of state to persist on instances.

https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L153-L157

We need...

self._challenge: Optional[_DigestAuthChallenge] = None
self._nonce_count = 1

We should store the challenge as state on the instance, rather than just a local variable...

https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L175

Similarly with the nonce count, which we should track as state on the instance, and increment on each usage...

https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L225

Once we're tracking those two bits of state, then we're able to implement challenge reuse. If we've already got a challenge, then the auth flow can skip directly to _build_auth_header...

https://github.com/encode/httpx/blob/71ee50b27770b461a5d2aaba9fca1fbc261bede1/httpx/_auth.py#L159-L177

We'll need a test case to cover this too.

Anyone want to give this a go? 😃

tomchristie avatar Oct 17 '22 10:10 tomchristie

i came across this issue when noticing huge performance differences, i gave it a go at #2463

rettier avatar Nov 25 '22 14:11 rettier

Not to mention that if you reauthenticate every request some servers will block you after 6-10 requests when using async.

paulschmeida avatar Nov 30 '22 10:11 paulschmeida

Closed via #2463.

tomchristie avatar Nov 30 '22 13:11 tomchristie