jwks-rsa-java icon indicating copy to clipboard operation
jwks-rsa-java copied to clipboard

Bad design of jwks cache

Open scrat98 opened this issue 2 years ago • 1 comments

Describe the problem you'd like to have solved

  1. Users may affect each other calling the server, that under the hood uses JwkProvider to validate the tokens. See the code below:
fun main() {
  val jwksUrl = "https://login.microsoftonline.com/common/discovery/keys".let {
    URI(it).normalize().toURL()
  }

  val foundKey = UrlJwkProvider(jwksUrl).all.first().id
  val notFoundKey = UUID.randomUUID().toString()

  val jwkProvider =
    JwkProviderBuilder(jwksUrl)
      .cached(100, 10, TimeUnit.MINUTES)
      .rateLimited(10, 1, TimeUnit.MINUTES)
      .build()

  /*
    someone requesting server with JWT, but which has invalid kid
   */
  repeat(11) {
    assertThrows<SigningKeyNotFoundException> {
      jwkProvider.get(notFoundKey)
    }
  }

  /*
    someone would like to request the server with valid JWT and will get rate limit error
   */
  jwkProvider.get(foundKey)
}
  1. If oauth2 server has several jwks (with different kid and algorithms) then library will call jwks endpoint as many times as there are keys. https://github.com/auth0/jwks-rsa-java/blob/master/src/main/java/com/auth0/jwk/UrlJwkProvider.java#L163

Describe the ideal solution

  1. Rate limit should not be there at all - it's responsibility of upper layer
  2. It's more preferable to cache all jwks at once, not one by one

I think the strategy should be the following: cache all jwks response from server and return found jwk by kid. Cache should be update by time.

Alternatives and current work-arounds

class CachedJwkProvider(
  private val delegate: UrlJwkProvider,
  private val expiration: Duration
) : JwkProvider, Closeable {

  private var cache = mapOf<String, Jwk>()

  private val cacheUpdaterJob = timer(
    name = "jwks-cache-updater",
    daemon = true,
    period = expiration.toMillis()
  ) {
    val actual = delegate.all.associateBy { it.id }
    cache = actual
  }

  override fun get(keyId: String): Jwk {
    return cache[keyId] ?: throw SigningKeyNotFoundException("No key found with kid $keyId", null)
  }

  override fun close() {
    cacheUpdaterJob.cancel()
  }
}

fun main() {
  val jwksUrl = "https://login.microsoftonline.com/common/discovery/keys".let {
    URI(it).normalize().toURL()
  }
  val urlJwkProvider = UrlJwkProvider(jwksUrl)
  val foundKey = urlJwkProvider.all.first().id
  val notFoundKey = UUID.randomUUID().toString()

  val jwkProvider = CachedJwkProvider(urlJwkProvider, Duration.ofMinutes(10))

  /*
    waiting cache to load. It's up to implementation make the "get()" call blocking or not. But I
    prefer do not wait any 3party system, therefore we need to wait here just for test
   */
  while (runCatching { jwkProvider.get(foundKey) }.isFailure)

  /*
    someone requesting server with JWT, but which has invalid kid
   */
  repeat(1000) {
    assertThrows<SigningKeyNotFoundException> {
      jwkProvider.get(notFoundKey)
    }
  }

  /*
    someone would like to request the server with valid JWT and will NOT get rate limit error
   */
  jwkProvider.get(foundKey)
}

scrat98 avatar Apr 18 '23 17:04 scrat98

Thanks @scrat98 for the feedback and proposed alternatives! Regarding the rate-limiting, you can choose to not enable rate limiting by not configuring it, correct? Regarding the caching behavior, that's something we should look into - it sounds familiar, I'll look into if this is something we looked into in the past and if there was any findings regarding the cache behaving that way.

jimmyjames avatar May 08 '23 13:05 jimmyjames

Hi @scrat98,

Thanks for raising this issue and for the detailed explanation and example.

We've made a change to the UrlJwkProvider to optimize how JWKS are fetched and cached in latest release:

We now cache the entire JWKS response after the first successful fetch.

On subsequent key lookups, If the requested kid is found in the cache, it's returned directly. If it's not found, the provider will refresh the JWKS once and attempt the lookup again. Only after a second miss will a SigningKeyNotFoundException be thrown.

Please Refer - configure rate limits and configure network timeout settings

Thank you

tanya732 avatar Aug 11 '25 15:08 tanya732

@tanya732 great, thank you. I'll close this issue then. Unfortunately, I won't have time to run the code example above to validate the changes, but it looks good

scrat98 avatar Aug 11 '25 15:08 scrat98