jjwt icon indicating copy to clipboard operation
jjwt copied to clipboard

JWT signature does not match locally computed signature only on some POD

Open Fs02 opened this issue 4 years ago • 12 comments

Describe the bug

We are using PS256 as signing algorithm with rotated keys. the app is running on java 11 and deployed in kubernetes. Recently we started to encounter signature verification issue, all jwt that passed through one pod will fail with JWT signature does not match locally computed signature. JWT validity cannot be asserted and should not be trusted message, but further retry with different pod confirms that the jwt is not the problem. Our current resolution for this is to kill the pod, but this is not ideal.

To Reproduce We are not able to reproduce it locally yet, but let me know if there's any info that might be helpful

Expected behavior Signature should be successfully verified, as it's verifiable on similar pods.

Screenshots

Fs02 avatar Aug 24 '21 06:08 Fs02

Is anything different between those instances? JVM, version of JJWT, configuration, etc?

bdemers avatar Aug 24 '21 17:08 bdemers

It's running on the same k8s deployment, so it should have the exact same configuration

Fs02 avatar Aug 25 '21 03:08 Fs02

How are your JWTs created? from some third-party service out of your control? Are you caching your keys? Could the failed nodes be using the old key? What does your key rotation strategy look like? Are you adding a new key for some period of time, then removing the old key once all of the old JWTs have expired?

It might be helpful to see what your JWT usage looks like, specifically your SigningKeyresolver that deals with your key rotation strategy.

bdemers avatar Aug 25 '21 20:08 bdemers

How are your JWTs created? from some third-party service out of your control?

We created the JWTs ourself using this library as well

Are you caching your keys? Could the failed nodes be using the old key?

we cache it, and refresh it periodically in background, we have different exception if key is not found from our cache

What does your key rotation strategy look like? Are you adding a new key for some period of time, then removing the old key once all of the old JWTs have expired?

We adding a new key once every day while keeping 3 last created key, key is generated by aws kms

It might be helpful to see what your JWT usage looks like, specifically your SigningKeyresolver that deals with your key rotation strategy.

sure:

class CachedKeyResolver(
    val url: String,
    specifiedExecutor: ScheduledExecutorService? = null
) : SigningKeyResolver {

    private val provider: UrlJwkProvider = UrlJwkProvider(URL(url))
    private val executor = specifiedExecutor ?: Executors.newSingleThreadScheduledExecutor {
        Executors.defaultThreadFactory().newThread(it).also { worker ->
            worker.name = "jwk-resolver-" + worker.id
        }
    }

    @Volatile
    private var cachedJwks: Map<String, Jwk> = emptyMap()

    init {
        executor.scheduleAtFixedRate(this::refreshPublicKeys, 0, PUBLIC_KEY_CACHE_TTL_IN_MIN, TimeUnit.MINUTES)
    }

    @PreDestroy
    private fun preDestroy() {
        executor.shutdown()
    }

    override fun resolveSigningKey(header: JwsHeader<out JwsHeader<*>>?, claims: Claims?): Key {
        val kid = header!!.getKeyId()
        if (kid == null && cachedJwks.size == 1) {
            return cachedJwks.values.elementAt(0).publicKey
        }
        val jwk = cachedJwks[kid] ?: throw SigningKeyNotFoundException("Failed to find key with kid $kid", null)
        return jwk.publicKey
    }

    override fun resolveSigningKey(header: JwsHeader<out JwsHeader<*>>?, plaintext: String?): Key {
        throw UnsupportedOperationException()
    }

    private fun refreshPublicKeys() {
        try {
            val jwks = provider.all
            cachedJwks = jwks.associateBy { it.id }
            log.info("Succeeded to refresh public keys from $url")
        } catch (e: Throwable) {
            log.error("Failed to retrieve public keys from $url", e)
        }
    }

    companion object {
        private val log = LoggerFactory.getLogger(CachedKeyResolver::class.java)
        private const val PUBLIC_KEY_CACHE_TTL_IN_MIN = 3L
    }
}

Fs02 avatar Aug 26 '21 02:08 Fs02

Any chance you captured one of the JWTs that failed to validate? That might help narrow down the issue too.

A couple of things stick out in your resolveSigningKey function:

  • if your JWT does NOT have a kid and you only have a single cached key
  • cachedJwks doesn't look threadsafe, it may change between checking the size and the lookup (assuming refreshPublicKeys() was called in between)

Other than that my only suggestion would be to add more debug/trace logic. Log the kid returned from your JWKS, log the header in the key resolver, and the header from the token when you see a SignatureException.

bdemers avatar Aug 26 '21 15:08 bdemers

Hi, thanks for your suggestion, I've added the log, and still waiting for the problem to happen again

cachedJwks doesn't look threadsafe, it may change between checking the size and the lookup (assuming refreshPublicKeys() was called in between)

can you explain how this can happens? cachedJwks = jwks.associateBy { it.id } looks normal to me, I'm assuming jwks.associateBy { it.id } will be evaluated first before assigned to cachedJwks 🤔

Fs02 avatar Aug 31 '21 06:08 Fs02

This assumes that the kid is null, but the problem I could see is annotated below. (which if it's a publicly accessible server this could be the case,)

override fun resolveSigningKey(header: JwsHeader<out JwsHeader<*>>?, claims: Claims?): Key {
        val kid = header!!.getKeyId()

        // current cachedJwks.size == 1

        if (kid == null && cachedJwks.size == 1) {

            // call to refreshPublicKeys finishes on other thread, cachedJwks.size is now 3

            return cachedJwks.values.elementAt(0).publicKey // grab the first key anyway
        }
        val jwk = cachedJwks[kid] ?: throw SigningKeyNotFoundException("Failed to find key with kid $kid", null)
        return jwk.publicKey
    }

But ultimately we don't have a lot to go on, and I'm just guessing.

bdemers avatar Aug 31 '21 14:08 bdemers

if your JWT does NOT have a kid and you only have a single cached key

The issue triggered again, but we don't see this log, looks like this is not the problem 🤔

Fs02 avatar Sep 06 '21 08:09 Fs02

What do your logs show? Did you fix the threading issue?

bdemers avatar Sep 06 '21 14:09 bdemers

This issue has been automatically marked as stale due to inactivity for 60 or more days. It will be closed in 7 days if no further activity occurs.

stale[bot] avatar Apr 16 '22 18:04 stale[bot]

Might be helpful

Try to validate the id_token instead of access_token because there may be a case, when access_token is not a valid JWT !

If you have id_token = null, you are missing to add the scope query param in the /authorise call. It requires openid scope to return the id_token !

Ref : https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow#request-an-id-token-as-well-or-hybrid-flow

fr-DeepakKUMAR avatar Apr 25 '22 18:04 fr-DeepakKUMAR

This issue has been automatically marked as stale due to inactivity for 60 or more days. It will be closed in 7 days if no further activity occurs.

stale[bot] avatar Jul 10 '22 18:07 stale[bot]

Closing due to inactivity from the OP, plus we're unable to re-create. That it works on some pods but not others with the same code seems to be a strong indicator that it's a cache coherency problem, at least without additional information or test cases we could try. We're happy to reopen this issue if anyone can help us try to re-create the problem though!

lhazlewood avatar Aug 11 '23 21:08 lhazlewood