vertx-auth icon indicating copy to clipboard operation
vertx-auth copied to clipboard

Fix synchronization for JWT increasing performance for token validation

Open zyclonite opened this issue 9 months ago • 7 comments

Motivation:

The observation was that reusing objects like Signature, Cipher and Mac and running it within synchronization blocks has a big impact on application performance. Used in a handler it serializes all eventloop threads and potentially introduces a lot of wait for all other threads.

Using the Signature.getInstance​(String algorithm) does not add a lot of overhead and is then safe to use within the same context for the operation.

Doing some performance tests on EC256 signatures got me nearly the same performance multiplicator as CPU cores available.

zyclonite avatar Jun 12 '25 17:06 zyclonite

I'm wondering if we could not store those in a thread local when the thread is a vertx thread (excluding virtual threads) otherwise create it as we go like you are doing

vietj avatar Jun 13 '25 10:06 vietj

Even calling getInstance on the same Thread will get you different instances of the object. So the idea is, not to re-use them. So i am not sure if we would win a lot with that except getting it off the eventloop thread. This might be done best on the authentication handler level i guess?

If your thought was in a different direction, let me know

zyclonite avatar Jun 13 '25 16:06 zyclonite

I think that what @vietj suggests is to create a something like ThreadLocalRandom for the Ciper or Signature instance, that would use a FastThreadLocal storage for reuse of cipher and signature when running on an event loop thread, otherwise creating objects on demand.

If you have time to do this it would be great.

tsegismont avatar Jun 18 '25 14:06 tsegismont

Understood but the Random use-case is a different one as this "should" be shared by definition, sharing the Signature object does not really add a lot of benefits from what i read so far.

I can do some further tests if it adds any performance gain.

zyclonite avatar Jun 18 '25 14:06 zyclonite

Did conduct some tests and the results are:

Using FastThreadLocal on a 12 core system had around 52ms versus 227ms with a plain getInstance for Signature and 36ms over 2651ms for Cipher (20 million cycles)

I could not find a vert.x way of checking for VertxThread and/or isVirtual as the impl package is no longer accessible in vert.x 5.x with modules so i used FastThreadLocalThread.currentThreadHasFastThreadLocal() from netty for now

zyclonite avatar Jun 20 '25 19:06 zyclonite

have a look at my last commit if this is in the direction or your idea and let me know

zyclonite avatar Jun 20 '25 19:06 zyclonite

@zyclonite sorry for the late reply. I did an update for JWE in https://github.com/zyclonite/vertx-auth/pull/1 that you can replicate in JWK and JWS.

tsegismont avatar Jun 26 '25 10:06 tsegismont

took me a while to find some time...

did a quick apply of your approach on the JWS impl as well but this approach would only work if the algorithm would always be the same for the same thread... that's the reason why the test fails now

zyclonite avatar Jul 06 '25 09:07 zyclonite

@zyclonite I wanted to create a pull request to your repo with https://github.com/eclipse-vertx/vertx-auth/pull/726/commits/c7d5d59e9d4adc281e658800e7c6c6dabe1de274 but incidentally pushed directly to your branch.

Anyway, that should fix the problem you mentioned in your previous comment.

Do we have some evidence (e.g. microbenchmarks) that creating signatures and ciphers on the fly is more expensive than reusing them?

tsegismont avatar Jul 07 '25 16:07 tsegismont

@tsegismont looks great, thanks for that solution.

I fear your example has the same issue -> JWE I can fix it there as well but i found out that the JWE implementation does actually not work properly, is nearly not implemented at all and does not have tests. I tried to think ahead and start implementing at least the recommended things but this starts to break with the JWK approach as it gets very complicated and would require a lot of refactoring.

zyclonite avatar Jul 08 '25 12:07 zyclonite

I fear your example has the same issue -> JWE

I'm not sure what you mean

tsegismont avatar Jul 08 '25 18:07 tsegismont

initially i was referring to your commit https://github.com/eclipse-vertx/vertx-auth/pull/726/commits/412682563165e06a8ee34ae302b083f9ee2f4cdc , that needs the same change with a selector on the algorithm, there is just no test that could fail in the same way.

the other statement was to the JWE implementation itself. not only that no test exist, it could have never worked based on that line https://github.com/eclipse-vertx/vertx-auth/blob/76717f72da6977e14394ef4ca67dfca47d430f31/vertx-auth-common/src/main/java/io/vertx/ext/auth/impl/jose/JWE.java#L38 (only if use=sig and that makes no sense) so this class needs a proper implementation first, then tests but in a different PR. i looked a bit into it but this will require a lot of refactoring of the JWK implementation as well...

zyclonite avatar Jul 08 '25 20:07 zyclonite

@zyclonite thanks for clarifying. I think our immediate goal must be to fix the performance issue where the implementation is trusted and tested.

the other statement was to the JWE implementation itself. not only that no test exist, it could have never worked based on that line

@pmlopes any comment on this?

tsegismont avatar Jul 17 '25 14:07 tsegismont

@zyclonite is there an opportunity for you to compare the performance of single-use ciphers/signatures vs fast thread local caching when running on a FastThreadLocalThread? In this case, could you please report the results?

tsegismont avatar Jul 17 '25 14:07 tsegismont

sorry, i lost a bit track of that issue and did not find time to do some extended testing but i am pretty sure it adds performance

shall i discard that PR as @vietj has started refactoring the whole crypto part of the implementation?

zyclonite avatar Sep 30 '25 10:09 zyclonite

sorry, i lost a bit track of that issue and did not find time to do some extended testing but i am pretty sure it adds performance

shall i discard that PR as @vietj has started refactoring the whole crypto part of the implementation?

yes, you can, this code will be moved to vertx-core, you can help review the other PR

later we will introduce fast thread local in the code moved to vertx-core that will be easier to implement

vietj avatar Sep 30 '25 12:09 vietj