tink-java icon indicating copy to clipboard operation
tink-java copied to clipboard

Huge performance gap of `JwtPublicKeyVerify` depending on the key within a keyset used for signature generation

Open ks-yim opened this issue 1 year ago • 4 comments

Help us help you

We'd like to know more about your Tink deployment.

Describe the bug:

JwtPublicKeyVerify primitive shows a huge performance gap in verifying a JWT depending on the key, within a keyset, used to generate the JWT signature.

  • JHM Benchmark Result (BM Code)
    # JMH version: 1.35
    # VM version: JDK 11.0.17, OpenJDK 64-Bit Server VM, 11.0.17+8
    # VM invoker: /Users/ks-yim/.sdkman/candidates/java/11.0.17-tem/bin/java
    # VM options: -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/ks-yim/IdeaProjects/tink-bm/build/tmp/jmh -Duser.country=KR -Duser.language=en -Duser.variant
    # Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
    # Warmup: 1 iterations, 10 s each
    # Measurement: 1 iterations, 10 s each
    # Timeout: 10 min per iteration
    # Threads: 8 threads, will synchronize iterations
    # Benchmark mode: Throughput, ops/time
    
    Benchmark                                                               Mode  Cnt      Score   Error  Units
    JwtPublicKeyVerifyBenchmark.verifyJwt_signedWith1stKey                 thrpt       17936.100          ops/s
    JwtPublicKeyVerifyBenchmark.verifyJwt_signedWith2ndKey                 thrpt       21440.300          ops/s
    JwtPublicKeyVerifyBenchmark.verifyJwt_signedWith3rdKey                 thrpt       21041.441          ops/s
    JwtPublicKeyVerifyBenchmark.verifyJwt_signedWith4thKey                 thrpt       43127.601          ops/s
    JwtPublicKeyVerifyBenchmark.verifyJwt_signedWith5thKey                 thrpt       39966.294          ops/s
    

What was the expected behavior?

No significant signature verification performance gap

How can we reproduce the bug?

Clone the BM code repository and hit the following command:

./gradlew --no-daemon clean jmh -Pjmh.includes=^com.example.tink.bm.JwtPublicKeyVerifyBenchmark

Do you have any debugging information?

WrappedJwtPublicKeyVerify#verifyAndDecode doesn't use TINK's conventional way of determining the right key within a keyset by parsing the keyId prefix which is done at O(1) time. Instead, it iterates over PrimitiveSet<JwtPublicKeyVerifyInternal> until it finds a matching key which generates the same signature as the one in the JWT, which makes the worst-case time complexity O(n).

But, it's not enough to explain the huge perf gap shown in the benchmark.

The real problem comes from the fact that verifyAndDecode generates a token signature with all tried keys within a keyset until it finds a match and uses GeneralSecurityException to control the code execution flow, which is very costly due to stacktrace generation.

A CPU profiling result for the worst case BM reveals that CPU cycles consumed for signature verification & token decoding process can potentially be optimized away. (For interactive version, hit curl -O https://raw.githubusercontent.com/ks-yim/tink-bm/master/src/jmh/java/com/example/tink/bm/jwt-public-key-verify-cpu-profiling.svg and open the downloaded file on your local)

  • Signature Verification Part image
  • Token Decoding Part image

What version of Tink are you using?

1.7.0

Can you tell us more about your development environment?

JDK 11, MacBook Pro(16-inch, 2021) Apple M1 Pro 32 GB

Is there anything else you'd like to add?

I'd like to suggest to move JwtFormat#splitSignedCompact from JwtPublicKeyVerificationInternal to JwtPublicKeyVerification so that we can have access to the parsed keyID upfront and find the right keyset at O(1) and do the token decoding & signature verification only once no matter how many keys exist in the keyset.

This will introduce a change to JwtPublicKeyVerifyInternal#verifyAndDecodeWithKid's method signature so I'd like to get some advices from the maintainers about whether it is the right direction.

ks-yim avatar Mar 23 '23 05:03 ks-yim

Thank you for filing this issue.

The main fear we have is that we don't want to use a complicated parser on unverified data if we can avoid it.

There are other things we can do:

  • For Tink style keys we can be more restrictive. For these we can e.g. just check for the substring "kid" in the header and read it out.
  • For other keys, we can see if such a string exists and try them as hints.

I am curious: do you use Tink to generate the keys you use? Do you import them? If so, how?

tholenst avatar Mar 28 '23 11:03 tholenst

@tholenst Thanks for suggesting alternative ways. It looks like not parsing the token before signature verification is a design decision as stated in JWT-HOWTO.md?

Would you mind me having a stab at it, though it will take some time?

I am curious: do you use Tink to generate the keys you use? Do you import them? If so, how?

Yes, I am using Tink to generate keys for JWT and encrypt the generated keysets with internal KMS (implemented tink's KmsClient interface for internal KMS integration) following tink/JAVA-HOWTO.md.

By importing do you mean converting an existing keyset in whatever format(e.g. JWK) into Tink format? In that sense, no, we are not importing keys.

ks-yim avatar Mar 28 '23 15:03 ks-yim

Thank you. Yes, you understood me correctly.

Indeed, we want to be careful with parsing unverified tokens. Parsing JSON can be tricky.

However, we should be able to do something here indeed; the issue is very real obviously. I will try to take a look.

tholenst avatar Mar 28 '23 15:03 tholenst

This is still something we would like to fix, but it turns out the internals make it more complicated than expected.

tholenst avatar Nov 29 '23 16:11 tholenst