ruby-jwt Potential JWK KeyFinder Denial of Service

Given the example in the README, I understand that JWT.decode can be passed a function to load a JWKS, and will subsequently use that to find the key to match the kid on the token. The code example indicates that the key set should be reloaded if options[:invalidate] is truthy. I went looking for other expected options, and found JWT::JWK::KeyFinder - given my understanding of that class:

The decode function uses #key_for to find a key matching the kid
#key_for triggers a key fetch if the local cache is empty
#key_for attempts to match the kid to a cached key
If the kid couldn't be matched, #key_for calls the JWK loader function with invalidate: true to reload the keys again
#key_for returns the matching cached key, if one was found, or otherwise raises an error

There are a number of scenarios where someone would want to use keys from a remote JWKS to validate incoming web requests. To simulate this, I used this basic Rack program:

# app.ru
# frozen_string_literal: true

# run this file using rackup

require "jwt"
require "json"

# public part of a throwaway testing key
JWKS = [JSON.parse(<<~JWK.strip)].freeze
  {"kty":"EC","crv":"P-256","x":"NXlMSa9koCduzwIIWNphGKn4qhQKh0cHTcM9qRuqP1A","y":"7Bqsz8ti9NhVNnsy_91XWr28_cewvSMclKKHAEeXEBA","kid":"example-test-key"}
JWK

class JwtVerifierMiddleware
  def initialize(app)
    @app = app
  end

  def call(env)
    token = env["HTTP_TEST_JWT"]

    jwk_loader = ->(options) do
      if options[:invalidate]
        puts "Invalidating JWK cache."
        @jwks = nil
      end

      @jwks ||= { keys: fetch_keys }
    end

    decoded, _header = JWT.decode(token, nil, true, { algorithms: ["ES256"], jwks: jwk_loader })

    env["jwt.subject"] = decoded["sub"]

    @app.call(env)
  rescue JWT::DecodeError => e
    [401, {}, ["Invalid token: #{e}"]]
  end

  def fetch_keys
    puts "Fetching keys..."

    # simulate a slow-ish network call
    sleep(1)

    JWKS
  end
end

use(JwtVerifierMiddleware)
run(->(env) { [200, {}, ["Hello, #{env["jwt.subject"]}!"]] })

Used as such:

$ VALID_TOKEN_HEADER="Test-Jwt: eyJraWQiOiJleGFtcGxlLXRlc3Qta2V5IiwiYWxnIjoiRVMyNTYifQ.eyJpYXQiOjE2NTcwODI4OTcsIm5iZiI6MTY1NzA4Mjg5NywiZXhwIjoxOTcyNDQyODk3LCJpc3MiOiJodHRwczovLzEyNy4wLjAuMSIsImF1ZCI6WyJ0ZXN0Il0sInN1YiI6IlRlc3QgVXNlciJ9.txj8pR-WqFyhYgBVPLhk8jKlAxBVEzhb4egVRD45G_JMOVIR4iPJwmXXXO_EaUos_vdG77XEboS5_ef4_V1ceA"

# same as above, but with the kid changed:
$ INVALID_TOKEN_HEADER="Test-Jwt: eyJraWQiOiAiaW52YWxpZC1rZXkiLCJhbGciOiJFUzI1NiJ9Cg==.eyJpYXQiOjE2NTcwODI4OTcsIm5iZiI6MTY1NzA4Mjg5NywiZXhwIjoxOTcyNDQyODk3LCJpc3MiOiJodHRwczovLzEyNy4wLjAuMSIsImF1ZCI6WyJ0ZXN0Il0sInN1YiI6IlRlc3QgVXNlciJ9.txj8pR-WqFyhYgBVPLhk8jKlAxBVEzhb4egVRD45G_JMOVIR4iPJwmXXXO_EaUos_vdG77XEboS5_ef4_V1ceA"

# slow first call due to fetch delay
$ time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
Hello, Test User!
real    0m1.029s
user    0m0.005s
sys     0m0.011s

# fast due to cached keys
$ time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
Hello, Test User!
real    0m0.021s
user    0m0.005s
sys     0m0.009s

# slow due to invalidation
$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"
Invalid token: Could not find public key for kid invalid-key
real    0m1.027s
user    0m0.005s
sys     0m0.009s

# slow due to invalidation again
$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"
Invalid token: Could not find public key for kid invalid-key
real    0m1.020s
user    0m0.003s
sys     0m0.006s

The issue here is that the key fetch function causes a "block":

$ time curl localhost:9292 -H "$INVALID_TOKEN_HEADER" & time curl localhost:9292 -H "$VALID_TOKEN_HEADER"
[1] 80945
Invalid token: Could not find public key for kid invalid-key
real    0m1.026s
user    0m0.004s
sys     0m0.009s
Hello, Test User![1]+  Done                    time curl localhost:9292 -H "$INVALID_TOKEN_HEADER"

real    0m1.027s
user    0m0.009s
sys     0m0.018s

Even in the case of multi-threaded or multi-process Rack servers, an attacker can cause a nuisance by continuously sending (easily generated) valid JWTs with a key of their own making, forcing the server to repeatedly re-fetch the JWKS across each of these threads/processes. In the best case scenario, this may delay a proportion of valid requests. In the worst case, it could result in other effects, like triggering an upstream rate limiter (preventing the server from processing valid incoming requests).

Ideally, there should be some kind of configurable minimum time between cache invalidations. As a first step, that should be made clear to anyone implementing a JWK loader function. Beyond that, I'd like to see this kind of attack considered by the library itself, but I recognise this is probably considered "out of scope" for the library.

Jul 06 '22 05:07 liamdawson

Cache invalidation, one of the hardest things in computer science.

Great sum-up and example of the issue with caching. There are for sure potential problems with the example if used as-is and I would suggest improving the readme on this topic.

Your suggestion of a configurable cache invalidation time is a little tricky because the gem itself does not really know anything about the surroundings. The keyloader could be different depending on where in the app the decode method is called so there can actually be multiple caches. I think that as long as the decode is just a stateless method, solving this problem fully on the gem side is not feasible.

I would suggest:

Stop using the term invalidate in the keyloader options to not indicate it's just about invalidating whatever cache is in place
Update the README to document the potential pitfalls in the JWK cache

Jul 06 '22 19:07 anakinj

I like the idea of changing the term invalidate.

I agree that it's tricky to provide a one-size-fits-all solution for handling the cache. I guess I'm struggling a bit to understand the point of the loader function in this context, given we're talking about a stateless environment in #decode. I haven't confirmed — will the key cache inside the JWT::JWK::KeyFinder actually act as a cache for calls to JWT.decode? As in, will two back-to-back calls to JWT.decode with the same token and loader function only result in the function being called once?

I think it might be worth giving a more detailed explanation of the expectations and shape of the loader function. I stumbled across this because I wanted to know if there were any other options I needed to account for, so having a definitive list of options would have addressed my concern.

Jul 07 '22 01:07 liamdawson

I haven't confirmed — will the key cache inside the JWT::JWK::KeyFinder actually act as a cache for calls to JWT.decode? As in, will two back-to-back calls to JWT.decode with the same token and loader function only result in the function being called once?

Each call to ::JWT.decode will have its own instance of a ::JWT::JWK::KeyFinder. The given loader method will be called at least as many times as the decode method is invoked and if no matching kids the loader method will be called again with the invalidate options set to true.

Jul 07 '22 19:07 anakinj

I think #501 is probably enough to make this issue less likely, so we can probably close this issue.

That said, given that #501 has been merged, we probably need to release a 2.5.0 sooner rather than later — anyone using the documented :kid_not_found option on 2.4.1 will never receive a truthy value, as the README is referring to (multiple) features that aren't in the current release. 🤔

Jul 29 '22 00:07 liamdawson

Yeah the relationship between the master branch and the released version is a little problematic. But with these resources I guess it's be best we can do :) But 2.5.0 is out now:) Super grateful for the input and concerns.

Also inspired by this and a few other things I made this super simple GEM to keep track of the JWKs. https://github.com/anakinj/jwk-loader

Aug 26 '22 08:08 anakinj