Potential JWK KeyFinder Denial of Service
Given the example in the README, I understand that JWT.decode can be passed a function to load a JWKS, and will subsequently use that to find the key to match the kid on the token. The code example indicates that the key set should be reloaded if options[:invalidate] is truthy. I went looking for other expected options, and found JWT::JWK::KeyFinder - given my understanding of that class:
- The decode function uses
#key_forto find a key matching thekid -
#key_fortriggers a key fetch if the local cache is empty -
#key_forattempts to match thekidto a cached key - If the
kidcouldn't be matched,#key_forcalls the JWK loader function withinvalidate: trueto reload the keys again -
#key_forreturns the matching cached key, if one was found, or otherwise raises an error
There are a number of scenarios where someone would want to use keys from a remote JWKS to validate incoming web requests. To simulate this, I used this basic Rack program:
# app.ru
# frozen_string_literal: true
# run this file using rackup
require "jwt"
require "json"
# public part of a throwaway testing key
JWKS = [JSON.parse(<<~JWK.strip)].freeze
{"kty":"EC","crv":"P-256","x":"NXlMSa9koCduzwIIWNphGKn4qhQKh0cHTcM9qRuqP1A","y":"7Bqsz8ti9NhVNnsy_91XWr28_cewvSMclKKHAEeXEBA","kid":"example-test-key"}
JWK
class JwtVerifierMiddleware
def initialize(app)
@app = app
end
def call(env)
token = env["HTTP_TEST_JWT"]
jwk_loader = ->(options) do
if options[:invalidate]
puts "Invalidating JWK cache."
@jwks = nil
end
@jwks ||= { keys: fetch_keys }
end
decoded, _header = JWT.decode(token, nil, true, { algorithms: ["ES256"], jwks: jwk_loader })
env["jwt.subject"] = decoded["sub"]
@app.call(env)
rescue JWT::DecodeError => e
[401, {}, ["Invalid token: #{e}"]]
end
def fetch_keys
puts "Fetching keys..."
# simulate a slow-ish network call
sleep(1)
JWKS
end
end
use(JwtVerifierMiddleware)
run(->(env) { [200, {}, ["Hello, #{env["jwt.subject"]}!"]] })
Used as such:
$ VALID_TOKEN_HEADER="Test-Jwt: eyJraWQiOiJleGFtcGxlLXRlc3Qta2V5IiwiYWxnIjoiRVMyNTYifQ.eyJpYXQiOjE2NTcwODI4OTcsIm5iZiI6MTY1NzA4Mjg5NywiZXhwIjoxOTcyNDQyODk3LCJpc3MiOiJodHRwczovLzEyNy4wLjAuMSIsImF1ZCI6WyJ0ZXN0Il0sInN1YiI6IlRlc3QgVXNlciJ9.txj8pR-WqFyhYgBVPLhk8jKlAxBVEzhb4egVRD45G_JMOVIR4iPJwmXXXO_EaUos_vdG77XEboS5_ef4_V1ceA"
# same as above, but with the kid changed:
$ INVALID_TOKEN_HEADER="Test-Jwt: eyJraWQiOiAiaW52YWxpZC1rZXkiLCJhbGciOiJFUzI1NiJ9Cg==.eyJpYXQiOjE2NTcwODI4OTcsIm5iZiI6MTY1NzA4Mjg5NywiZXhwIjoxOTcyNDQyODk3LCJpc3MiOiJodHRwczovLzEyNy4wLjAuMSIsImF1ZCI6WyJ0ZXN0Il0sInN1YiI6IlRlc3QgVXNlciJ9.txj8pR-WqFyhYgBVPLhk8jKlAxBVEzhb4egVRD45G_JMOVIR4iPJwmXXXO_EaUos_vdG77XEboS5_ef4_V1ceA"
# slow first call due to fetch delay
$ time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
Hello, Test User!
real 0m1.029s
user 0m0.005s
sys 0m0.011s
# fast due to cached keys
$ time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
Hello, Test User!
real 0m0.021s
user 0m0.005s
sys 0m0.009s
# slow due to invalidation
$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"
Invalid token: Could not find public key for kid invalid-key
real 0m1.027s
user 0m0.005s
sys 0m0.009s
# slow due to invalidation again
$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"
Invalid token: Could not find public key for kid invalid-key
real 0m1.020s
user 0m0.003s
sys 0m0.006s
The issue here is that the key fetch function causes a "block":
$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER" & time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
[1] 80945
Invalid token: Could not find public key for kid invalid-key
real 0m1.026s
user 0m0.004s
sys 0m0.009s
Hello, Test User![1]+ Done time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"
real 0m1.027s
user 0m0.009s
sys 0m0.018s
Even in the case of multi-threaded or multi-process Rack servers, an attacker can cause a nuisance by continuously sending (easily generated) valid JWTs with a key of their own making, forcing the server to repeatedly re-fetch the JWKS across each of these threads/processes. In the best case scenario, this may delay a proportion of valid requests. In the worst case, it could result in other effects, like triggering an upstream rate limiter (preventing the server from processing valid incoming requests).
Ideally, there should be some kind of configurable minimum time between cache invalidations. As a first step, that should be made clear to anyone implementing a JWK loader function. Beyond that, I'd like to see this kind of attack considered by the library itself, but I recognise this is probably considered "out of scope" for the library.
Cache invalidation, one of the hardest things in computer science.
Great sum-up and example of the issue with caching. There are for sure potential problems with the example if used as-is and I would suggest improving the readme on this topic.
Your suggestion of a configurable cache invalidation time is a little tricky because the gem itself does not really know anything about the surroundings. The keyloader could be different depending on where in the app the decode method is called so there can actually be multiple caches. I think that as long as the decode is just a stateless method, solving this problem fully on the gem side is not feasible.
I would suggest:
- Stop using the term
invalidatein the keyloader options to not indicate it's just about invalidating whatever cache is in place - Update the README to document the potential pitfalls in the JWK cache
I like the idea of changing the term invalidate.
I agree that it's tricky to provide a one-size-fits-all solution for handling the cache. I guess I'm struggling a bit to understand the point of the loader function in this context, given we're talking about a stateless environment in #decode. I haven't confirmed — will the key cache inside the JWT::JWK::KeyFinder actually act as a cache for calls to JWT.decode? As in, will two back-to-back calls to JWT.decode with the same token and loader function only result in the function being called once?
I think it might be worth giving a more detailed explanation of the expectations and shape of the loader function. I stumbled across this because I wanted to know if there were any other options I needed to account for, so having a definitive list of options would have addressed my concern.
I haven't confirmed — will the key cache inside the JWT::JWK::KeyFinder actually act as a cache for calls to JWT.decode? As in, will two back-to-back calls to JWT.decode with the same token and loader function only result in the function being called once?
Each call to ::JWT.decode will have its own instance of a ::JWT::JWK::KeyFinder. The given loader method will be called at least as many times as the decode method is invoked and if no matching kids the loader method will be called again with the invalidate options set to true.
I think #501 is probably enough to make this issue less likely, so we can probably close this issue.
That said, given that #501 has been merged, we probably need to release a 2.5.0 sooner rather than later — anyone using the documented :kid_not_found option on 2.4.1 will never receive a truthy value, as the README is referring to (multiple) features that aren't in the current release. 🤔
Yeah the relationship between the master branch and the released version is a little problematic. But with these resources I guess it's be best we can do :) But 2.5.0 is out now:) Super grateful for the input and concerns.
Also inspired by this and a few other things I made this super simple GEM to keep track of the JWKs. https://github.com/anakinj/jwk-loader