jjwt
jjwt copied to clipboard
Optimization: Cache header
Initial tests show that caching the header could improve the performance by about 10%.
So this would be possible (as far as I can see), if keys returned by SigningKeyResolver could either be validated on use, or the cache cleared when keys are no longer valid.
Hey @skjolber,
I'm not sure I'm following completely, can you point to where you think the keys should be cached?
What implementation of SigningKeyResolver
are you using? It should be possible to create an implementation that caches your key.
For example, I have a simplistic resolver that caches keys here: https://github.com/okta/okta-jwt-verifier-java/blob/master/impl/src/main/java/com/okta/jwt/impl/jjwt/RemoteJwkSigningKeyResolver.java
The whole JWT header (as a String) should be mapped to a cache entry containing
- Header
- SignatureAlgorithm
- Key
- CompressionCodec
Also, this means that the SigningKeyResolver only looks at the header, i.e. not the body, when determining the appropriate key.
So something like this.
OH, you want to cache the actual base64 encoded header in order to optimize the dserialization of the header?
So in the case of a JWT:
eyJ0eXAiOiJKV1QiLCJraWQiOiIxMjMiLCJhbGciOiJSUzUxMiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWUsImlhdCI6MTYxNDI5NTc4OSwiZXhwIjoxNjE0Mjk5Mzg5fQ.uG0HtgSKJQsXGMfEdWKMKOK3aY0oO1rURzgnMH6pPTsshdWaX3RYWyXLxjBHXkDETv78xWndFZbTd71ePjm4UoTF3hGYemFHVKcJt4cWCvfItQ86PNM0TwoX72QDrb36nMNpwxX3D6P08qLDfRCtJSnRo1k0U6gVnPs2CQv82SYy32iTqDc7xIumntfruBhfwXFRebanzjHBSRSat9B-1e6ReNdP7HybioDlWRMoQ0ypaAA4BkGwF0ChqOvI9KtSEiU6htJ2CJAgeRO7IeVgYg0K3ADVjI88Us4MUr7ZySFY7YE2LZgmxFhiHaDeWg3Z_lrRobpvRaF-11duNRODJw
You would use the header eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzUxMiJ9
as the cache key, and the value of said entry would be an object wrapping the decoded header value in my case a JwsHeader
containing:
{
"typ": "JWT",
"kid": "123",
"alg": "RS512"
}
(NOTE: this could be a JweHeader
in the future)
This should be possible, we would need a cache interface, that likely defaults to an implementation that uses something like a SoftHashMap
.
@skjolber any idea on the average gain from something like this? (or what percentage of the benchmark is spent dealing with the header)?
Run the benchmark using ./gradlew --stop && ./gradlew clean build jmhClasses jmh --info -x :frameworks:jjwt-bench:spotbugsMain --stacktrace
with java 8 for the following output:
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
JwtVerifyBenchmark.auth0_verify | thrpt | 5 | 28880,019 | ± 1439,777 | ops/s |
JwtVerifyBenchmark.fusionauth2_verify | thrpt | 5 | 32855,697 | ± 1278,328 | ops/s |
JwtVerifyBenchmark.fusionauth_verify | thrpt | 5 | 32672,284 | ± 450,467 | ops/s |
JwtVerifyBenchmark.jjwt_verify | thrpt | 5 | 30357,304 | ± 875,569 | ops/s |
JwtVerifyBenchmark.jjwt_verify2 | thrpt | 5 | 32772,563 | ± 686,030 | ops/s |
JwtVerifyBenchmark.nimbus_verify | thrpt | 5 | 19105,788 | ± 599,333 | ops/s |
JwtVerifyBenchmark.okta_verify | thrpt | 5 | 1227,194 | ± 41,579 | ops/s |
In a nutshell, performane is 30357 vs 32772, so that is 8 percent, on AMD zen 3.
The header cache might also have a function in throwing away unwanted jwts, if the full set of expected header parameters is known (including the cache key), it should be trivial to create a cache which also is a filter, i.e. for example automatically generate all possible legal header keys on init or when loading remote jwks.
With key rotation for remote jwks, the list all of all valid kid
isn't known ahead of time, so could not automatically prime the cache. I think for many use-cases the number of different headers an application sees is probably minimal, so caching after decoding should be enough to see the benefit.
It is unclear to me how valuable this would actually be in real-world applications, as opposed to a benchmark suite. (I'm not saying it is not valuable - just that I honestly don't know if it is or not).
Things like exp
and jti
fields are very common which means you still have to deserialize the header into JSON in order to assert things like expiration and other restrictions (nbf
, etc). Most apps in practice should be using at least the exp
header, so that implies deserialization in the large majority of cases anyway.
Does this make sense? Or might I be missing something (and I very well could be).
@bdemers Remote keys must be downloaded before evaulating tokens, so the set of possible headers will be known in time (at runtime), but lets save the 'permutation' approach for later - you're right the simpler caching after decoding should be good enough.
@lhazlewood Yes, but those fields are in the JWT body. The JWT header is usually data which almost static, like signature algorithm, body compression algorithm and key id, so can stay the same for days or months at a time.
Hey @skjolber, The list of jwks keys is NOT always known ahead of time, this is an important detail when it comes to key rotation and removing any exposed keys. The initial set of keys would be known, and these could be cached, but cache misses would be expected once the IdP starts issuing tokens with a newer key.
It's also worth mentioning, the caching strategy would also be slightly coupled to the SigningKeyResolver
, (at least in the case of a JWKS endpoint implementation, invalid kid
values would need to be removed from the cache to ensure future tokens do NOT use keys that have been removed or rotated out).
I'm also not as familiar with the list of JWE headers, so I don't know if this would be a JWS only optimization or not, @lhazlewood probably has more insight on this.
@bdemers I agree. The remote JWKs should be kept up-to-date; refreshes triggered either by time or by unknown keys. JWKs no longer on the list should be banned. Thus changes to the JWKs must affect the cache.
In general keeping the remote JWKs up-to-date in an optimal way is somewhat complicated - here is my take on it (implementation).
What I was trying to communicate was that once the JWKs are refreshed, before a JWT signature is evaluated, the key ids are known.
In my eyes, JWTs with unknown key ids is a trivial denial-of-service vector, so I've implemented a trivial rate limiter. Basically all our kubernetes JVMs could be asking the same authorization server for JWKs, which is not good - probably it will itself start dropping requests or take a long time to respond. So also we just do one request at a time per JVM; however all request with unknown key ids have to be kept waiting (in-flight), which is also negative. So the rate-limiter will result in some legal requests being dropped, if someone is making a lot of fake requests while we rotate the keys, which is acceptable for us.
@skjolber I know it has been a while, but I am still interested in this, it was just a lower priority than the JWE stuff we released recently.
That said, the newest parser implementation uses more efficient I/O (streams, buffers, etc) and @bdemers noticed a decent improvement in performance, sometimes more than 15-20% if I remember correctly.
Any interest in trying the benchmark again on the latest release?
@lhazlewood Sure, however are you committed to actually taking in potentially painful changes for performance improvements? My (very simple) previous PR has been pending for years, indicating that your are not.
@skjolber that's a fair critique and concern, and I appreciate what you're saying.
However, that PR still remains open because it's not clear what the JDK 8 changes to the JJWT APIs for 1.0 will look like and I was going to evaluate it in the context of those changes. In other words, my https://github.com/jwtk/jjwt/pull/651#issuecomment-789249066 still applies.
That said, I was more curious about what the 0.12.0 performance enhancements look like in the benchmark suite, as well as how the JWT header caching might still benefit, so I'll try to run it locally first before bothering you any further. I definitely don't want to distract you until we're actively ready to prioritize this relative to the other 1.0 work.
Thanks for checking in!
@lhazlewood so have you considered adding a benchmarking (JMH) module to the project itself? My impression is that this library supports more features than the other libs I have seen (i.e. the ones used in java-jwt-benchmark) and so a wider (handful?) selection of performance tests could be useful.