jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8317538: potential bottleneck in Provider::getService: specjvm2008::crypto.rsa have scalability issue for high vCPU numbers

Open IvaVladimir opened this issue 1 year ago • 7 comments

This patch remove access to the shared variable to fix scalability issue in the multithread environment. According to testing by the specjvm2008::crypto.rsa the one thread performance reduced for less than 1% while the score for the multithread run increased in ~2x. For the 2 socket system with Xeon 8480+ numbers looks as:

  • 1 thread: 643.15 for original version vs 642.54 for patched one;
  • 224 threads: 22446.19 for original vs 46147.41 for patched.

The RSABench microbenchmark reports no score changes for the 1 thread (average for all testcases) and 2.4% improvement for the 224 threads.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Warning

 ⚠️ Found leading lowercase letter in issue title for 8317538: potential bottleneck in Provider::getService: specjvm2008::crypto.rsa have scalability issue for high vCPU numbers

Issue

  • JDK-8317538: potential bottleneck in Provider::getService: specjvm2008::crypto.rsa have scalability issue for high vCPU numbers (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21622/head:pull/21622
$ git checkout pull/21622

Update a local copy of the PR:
$ git checkout pull/21622
$ git pull https://git.openjdk.org/jdk.git pull/21622/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21622

View PR using the GUI difftool:
$ git pr show -t 21622

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21622.diff

Webrev

Link to Webrev Comment

IvaVladimir avatar Oct 21 '24 18:10 IvaVladimir

:wave: Welcome back vaivanov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 21 '24 18:10 bridgekeeper[bot]

@IvaVladimir This change is no longer ready for integration - check the PR body for details.

openjdk[bot] avatar Oct 21 '24 18:10 openjdk[bot]

:warning: @IvaVladimir a branch with the same name as the source branch for this pull request (master) is present in the target repository. If you eventually integrate this pull request then the branch master in your personal fork will diverge once you sync your personal fork with the upstream repository.

To avoid this situation, create a new branch for your changes and reset the master branch. You can do this by running the following commands in a local repository for your personal fork. Note: you do not have to name the new branch NEW-BRANCH-NAME.

$ git checkout -b NEW-BRANCH-NAME
$ git branch -f master 37aa320f573650f007e60729e4d187c3b96b5756
$ git push -f origin master

Then proceed to create a new pull request with NEW-BRANCH-NAME as the source branch and close this one.

openjdk[bot] avatar Oct 21 '24 18:10 openjdk[bot]

@IvaVladimir The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 21 '24 18:10 openjdk[bot]

Webrevs

mlbridge[bot] avatar Oct 21 '24 18:10 mlbridge[bot]

Is Provider::getService a bottleneck in real apps or is this mostly something which manifests in specjvm2008 and such benchmarks (which likely does what could be considered setup logic in the hot path)?

I agree with the changes though: caching transient lookup keys seem a dubious optimization at best on modern HW/JVMs, and doing so in a way that might cause cache-line ping-pong on large systems looks pretty bad.

cl4es avatar Oct 21 '24 20:10 cl4es

I was failed to find any real application that was affected. For now only specjvm2008 reports different scores. Seems, this optimization was designed for the one thread performance long time ago but now even for one thread it gives less than run-to-run variation.

IvaVladimir avatar Oct 21 '24 23:10 IvaVladimir

Please hold off on integrating until someone from the Security group reviews this.

Also, you should change the Assignee of the issue to yourself.

seanjmullan avatar Oct 24 '24 17:10 seanjmullan

Did you see the same performance improvement with specjvm2008::crypto.aes? I believe that should be running through this code as well unless there is something particular with RSA or any other asymmetric key.

ascarpino avatar Oct 24 '24 18:10 ascarpino

Is volatile a known performance decelerator? I don't see any source control history for that part of the code, so I can only speculate at previousKey was to improve benchmarking. Did you try just removing volatile and leaving the rest of the code?

ascarpino avatar Oct 24 '24 18:10 ascarpino

/reviewers 2 reviewer

cl4es avatar Oct 24 '24 22:10 cl4es

Note the bot warning on the branch name being master. Will you change this following the bot's instruction? Also the minor issue with the convention of starting the synopsis with Upper case, i.e. "potential" -> "Potential".

valeriepeng avatar Oct 24 '24 22:10 valeriepeng

@cl4es The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

openjdk[bot] avatar Oct 24 '24 23:10 openjdk[bot]

The specjvm::crypto.aes workload not affected by this issue. It implemented as mix of DES and AES encryptions but AES much better optimized vs DES. The top 3 methods for this workload for the 112T run are: 85.6% com::sun::crypto::provider::DESCrypt::cipherBlock 6.5% com::sun::crypto::provider::DESCrypt::perm 4.4% com::sun::crypto::provider::CipherCore::update i.e. this workload have no problem with the getService method.

IvaVladimir avatar Oct 25 '24 00:10 IvaVladimir

Without the 'volatile' the performance improved for ~6% (22293 -> 23638) for the 224 threads run. Without the 'static volatile' performance improved ~1.9x (22293 -> 42045). It may be 'transient' instance field. Scores for the 1 thread runs similar for all 3 modification and fit to run-to-run variation.

IvaVladimir avatar Oct 25 '24 02:10 IvaVladimir

Note the bot warning on the branch name being master. Will you change this following the bot's instruction? Also the minor issue with the convention of starting the synopsis with Upper case, i.e. "potential" -> "Potential".

I had missed this bot comment. @IvaVladimir you probably ought to take the bots advice and close this PR, move the changes over to a new branch and create a new PR from that branch -- otherwise your repository will get into a real wonky state after integration.

cl4es avatar Oct 25 '24 10:10 cl4es