getValidTokens starts failing after ~a week

Open bmillwood opened this issue 1 year ago • 1 comments

I'm running a server that does OIDC.discover "https://accounts.google.com" at startup and stores that in memory, using it to do OAuth requests from there.

I find that about a week after starting the server, my authorisation starts failing with:

JwtException (KeyError "No suitable key was found to decode the JWT")

until I restart the server. My guess is that my OIDC.discover fetches some keys (?) that have a 7 day expiration, and it needs to just redo discovery from time to time, but I haven't yet figured out what exactly expires every 7 days.

By wrapping my library calls with exception handlers, I was able to figure out that getValidTokens was the one raising this exception. Looking at the code, my best guess is that it's this:

https://github.com/krdlab/haskell-oidc-client/blob/2d19db09bf13f02f49248f7b21703b2c59e06ecc/src/Web/OIDC/Client/Tokens.hs#L88

in validateIdToken, called by validate, called by requestTokens, called by getValidTokens.

I guess my main bug report here is that it's difficult to figure out from this exception what's actually going on. Most concretely, I notice that if all attempted decodings fail, selectDecodedResult just gives the first error, and discards the rest of them. I can't be sure the exception I'm seeing isn't one that actually occurs every time, but is usually masked by another member of algs succeeding.

I'll try to investigate more, but obviously with a bug that I can only reproduce once a week it's somewhat difficult to make quick progress. Would appreciate any pointers :)

Nov 07 '24 16:11 bmillwood

I was reading Google's OIDC discovery docs and noted that it said things about respecting normal caching rules. Indeed, the discovery URL does have an expires header and a cache-control header:

$ curl --head https://accounts.google.com/.well-known/openid-configuration
HTTP/2 200 
accept-ranges: bytes
access-control-allow-origin: *
content-security-policy-report-only: require-trusted-types-for 'script'; report-uri https://csp.withgoogle.com/csp/federated-signon-mpm-acces
cross-origin-opener-policy: same-origin; report-to="federated-signon-mpm-access"
report-to: {"group":"federated-signon-mpm-access","max_age":2592000,"endpoints":[{"url":"https://csp.withgoogle.com/csp/report-to/federated-s
content-length: 1268
x-content-type-options: nosniff
server: sffe
x-xss-protection: 0
date: Sat, 21 Dec 2024 19:26:35 GMT
expires: Sat, 21 Dec 2024 20:26:35 GMT
cache-control: public, max-age=3600
age: 3339
last-modified: Wed, 14 Feb 2024 20:36:03 GMT
content-type: application/json
vary: Accept-Encoding
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000

So I think this issue is one main problem: the existing discovery mechanism gives me no way to find out how long the response is valid for,

and one more minor / less clear problem: the issues I raised about error messages not being super clear.

I'll see if I can tackle the first.

Dec 21 '24 20:12 bmillwood