jdk
jdk copied to clipboard
8329538: Accelerate P256 on x86_64 using Montgomery intrinsic
Performance. Before:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ± 6.491 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ± 4.954 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ± 36.979 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ± 45.487 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ± 26.584 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ± 23.547 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ± 10.970 ops/s
Performance, no intrinsic:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ± 42.420 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ± 133.566 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ± 54.071 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ± 35.920 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ± 29.858 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ± 28.722 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ± 10.591 ops/s
Performance, with intrinsics
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 10384.591 ± 65.274 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 9592.912 ± 236.411 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 3479.494 ± 44.578 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 3402.147 ± 26.772 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2527.678 ± 64.791 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2541.258 ± 66.634 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply true thrpt 3 3021.139 ± 98.289 ops/s
Summary on design (see code for 'ASCII art', references and details on math):
- Added a new
IntegerPolynomial
field (MontgomeryIntegerPolynomialP256
) with 52-bit limbs-
getElement(*)/fromMontgomery()
to convert numbers into/out of the field
-
-
ECOperations
is the primary use of the new field- flattened some extra deep nested class hierarchy (also in prep for further other field optimizations)
-
forParameters()/multiply()/setSum()
generates numbers in the new field
-
ProjectivePoint/Montgomery{Imm|M}utable.asAffine()
to convert out of the new field - Added Fuzz Testing and KAT verified with OpenSSL
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [x] Change must be properly reviewed (3 reviews required, with at least 1 Reviewer, 2 Authors)
Issue
- JDK-8329538: Accelerate P256 on x86_64 using Montgomery intrinsic (Enhancement - P4)
Reviewers
- Magnus Ihse Bursie (@magicus - Reviewer) ⚠️ Review applies to 82b6dae7
- Anthony Scarpino (@ascarpino - Reviewer) ⚠️ Review applies to 8cd095dd
- Sandhya Viswanathan (@sviswa7 - Reviewer)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583
$ git checkout pull/18583
Update a local copy of the PR:
$ git checkout pull/18583
$ git pull https://git.openjdk.org/jdk.git pull/18583/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18583
View PR using the GUI difftool:
$ git pr show -t 18583
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18583.diff
Webrev
:wave: Welcome back vpaprotsk! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@vpaprotsk This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8329538: Accelerate P256 on x86_64 using Montgomery intrinsic
Reviewed-by: ihse, ascarpino, sviswanathan
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 8 new commits pushed to the master
branch:
- 9ca90ccd6bfec76e54e2e870bd706fad5abf233c: 8332610: Remove unused nWakeups in ObjectMonitor
- 92d33501e091bdfaab52886078053b849a5a8f68: 8331920: ubsan: g1CardSetContainers.inline.hpp:266:5: runtime error: index 2 out of bounds for type 'G1CardSetHowl::ContainerPtr [2]' reported
- 4f1a10f84bcfadef263a0890b6834ccd3d5bb52f: 8332360: JVM hangs at exit when running on a uniprocessor
- c3bc23fe48ca1603afe68a6ac4aaa523a1edbb41: 8326306: RISC-V: Re-structure MASM calls and jumps
- 8a9d77d58de259b6b2bdc2cc9e7bfdc28dcf7165: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms
- 3d511ff63e59f542ae20c722bfef1c867cd1da0e: 8329748: Change default value of AssertWXAtThreadSync to true
- 67f03f2a4f5ac12748ffbf5c04f248a60869e180: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV
- 5f804b2ec12627b593353ceeab881187b0bb5cd6: 8329825: Clarify the value type for java.net.SocketOptions.SO_LINGER
Please see this link for an up-to-date comparison between the source branch of this pull request and the master
branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@magicus, @jatin-bhateja, @ascarpino, @sviswa7) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate
in a new comment. (Afterwards, your sponsor types /sponsor
in a new comment to perform the integration).
@vpaprotsk The following labels will be automatically applied to this pull request:
-
build
-
core-libs
-
graal
-
hotspot
-
security
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
Webrevs
- 11: Full - Incremental (b1a33004)
- 10: Full - Incremental (df4fe6fa)
- 09: Full - Incremental (5c360e35)
- 08: Full - Incremental (8cd095dd)
- 07: Full - Incremental (83b21310)
- 06: Full - Incremental (1ecfdc44)
- 05: Full - Incremental (8ff243a2)
- 04: Full - Incremental (a1984501)
- 03: Full - Incremental (c93a71f0)
- 02: Full - Incremental (6f9ac046)
- 01: Full - Incremental (82b6dae7)
- 00: Full (dbe6cd3b)
@magicus The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 3 (with at least 1 Reviewer, 2 Authors).
@ascarpino Hi Tony, this is the ECC P256 PR we talked about last year, would appreciate your feedback.
In ECOperations.java
, if I understand this correctly, it is to replace the existing PointMultiplier
with montgomery-based PointMuliplier. But when I look at the code, I see both are still options. If I read this correctly, it checks for the old IntegerFieldModuloP
, then looks for the new IntegerMontgomeryFieldModuloP
. It appears to use the new one always.
Why doesn't it just replace the old implementation entry in the fields
Map? Is there a reason to keep it around?
In
ECOperations.java
, if I understand this correctly, it is to replace the existingPointMultiplier
with montgomery-based PointMuliplier. But when I look at the code, I see both are still options. If I read this correctly, it checks for the oldIntegerFieldModuloP
, then looks for the newIntegerMontgomeryFieldModuloP
. It appears to use the new one always. Why doesn't it just replace the old implementation entry in thefields
Map? Is there a reason to keep it around?
Hmm, thats a good point I haven't fully considered; i.e. (if I read correctly) "for CurveDB.P_256
remove the fallback path to non-montgomery entirely".. that might also help in cleaning a few things up in the construction. Maybe even get rid of this nested ECOperations inside ECOperations.. Perhaps nesting isnt a big deal, but all attempts to make the ECC stack clearer is positive!
One functional reason that might justify keeping it as-is, is fuzz-testing; with the fallback available, I am able to write the included Fuzz tests and have them check the values against the existing implementation. While I also included a few KAT tests using openssl-generated values, the fuzz tests check millions of values and it does add a lot more certainty about correctness of this code.
Can it be removed? For the operations that do not involve multiplication (i.e. setSum(*)
), montgomery is expensive. I think I did go through the uses of this code some time back (i.e. ECDHE, ECDSA and KeyGeneration) and existing IntegerPolynomialP256 is no longer used (I should verify that again) and only P256OrderField remains non-montgomery. So removing references to IntegerPolynomialP256 in ECOperations should be possible and cleaner. Removing IntegerPolynomialP256 from MontgomeryIntegerPolynomialP256 is harder (fromMontgomery() uses IntegerPolynomialP256) but perhaps also worth some thought..
I tend to like ECOperationsFuzzTest.java
and would prefer to keep it, but it could also be chucked up as part of 'scaffolding' and removed in name of code quality?
Thanks @ascarpino
PS: Perhaps there is some middle ground, remove the ECOperations montgomeryOps
nesting, and construct (somehow?? singleton makes most things inaccessible..) the reference ECOperations in the fuzz test instead.. not sure how yet, but perhaps worth a further thought..
Few early comments.
Please update the copyright year of all the modified files.
You can even consider splitting this into two patches, Java side changes in one and x86 optimized intrinsic in next one.
Thanks Jatin, will fix!
In
ECOperations.java
, if I understand this correctly, it is to replace the existingPointMultiplier
with montgomery-based PointMuliplier. But when I look at the code, I see both are still options. If I read this correctly, it checks for the oldIntegerFieldModuloP
, then looks for the newIntegerMontgomeryFieldModuloP
. It appears to use the new one always. Why doesn't it just replace the old implementation entry in thefields
Map? Is there a reason to keep it around?Hmm, thats a good point I haven't fully considered; i.e. (if I read correctly) "for
CurveDB.P_256
remove the fallback path to non-montgomery entirely".. that might also help in cleaning a few things up in the construction. Maybe even get rid of this nested ECOperations inside ECOperations.. Perhaps nesting isnt a big deal, but all attempts to make the ECC stack clearer is positive!One functional reason that might justify keeping it as-is, is fuzz-testing; with the fallback available, I am able to write the included Fuzz tests and have them check the values against the existing implementation. While I also included a few KAT tests using openssl-generated values, the fuzz tests check millions of values and it does add a lot more certainty about correctness of this code.
I hadn't looked at your fuzz test until you mentioned it. I see you are using reflection to change the values. Is that what you mean by "fallback"? I'm assuming there is no to access the older implementation without reflection.
Can it be removed? For the operations that do not involve multiplication (i.e.
setSum(*)
), montgomery is expensive. I think I did go through the uses of this code some time back (i.e. ECDHE, ECDSA and KeyGeneration) and existing IntegerPolynomialP256 is no longer used (I should verify that again) and only P256OrderField remains non-montgomery. So removing references to IntegerPolynomialP256 in ECOperations should be possible and cleaner. Removing IntegerPolynomialP256 from MontgomeryIntegerPolynomialP256 is harder (fromMontgomery() uses IntegerPolynomialP256) but perhaps also worth some thought..I tend to like
ECOperationsFuzzTest.java
and would prefer to keep it, but it could also be chucked up as part of 'scaffolding' and removed in name of code quality?
I wouldn't rip out the old implementation. I have been wondering if we should make the older implementation available, maybe by security property. I was looking at the static Maps at the top of ECOperations
, forParameters
, and the constructors where it checks if the montgomeryOps
was null or set. It would be nice if we could have one set of fields
Maps by putting the montgomery entry into the fields
to replace it. I think that should work because IntegerMontgomeryFieldModuloP
extends IntegerFieldModuloP
. instanceof
or other montgomeryOps
checks would still need to exist because not all the fields
support mongomery, and the older implementation would still be accessible for your fuzz tester. At least that is my theory.
Thanks @ascarpino
PS: Perhaps there is some middle ground, remove the
ECOperations montgomeryOps
nesting, and construct (somehow?? singleton makes most things inaccessible..) the reference ECOperations in the fuzz test instead.. not sure how yet, but perhaps worth a further thought..
It would be nice to remove the nesting and it would be nice to be a singleton. Maybe some combination of what I mentioned above chance can help that. I haven't fully thought this out either.
Few early comments.
Please update the copyright year of all the modified files.
You can even consider splitting this into two patches, Java side changes in one and x86 optimized intrinsic in next one.
Fixed all copyright years
git diff da8a095a19c90e7ee2b45fab9b533a1092887023 | lsdiff -p1 | while read line; do echo $line =========================; grep Copyright $line | grep -v 2024; done | less
Re splitting.. probably too late for that now.. (did consider it initially.. got hard to manage two changes while developing. And easier to justify the change when the entire patch is presented? but yes, far more code to review.. )
@ascarpino Fixed as suggested... actually.. that was waaay easier then I thought it would be
(I saw singleton and assumed private constructor.. nope, ECOperations() is public, no reflection required!! Ended up with cleaner implementation and cleaner tests! Thanks!)
Sidenote: Unlike the hashtable you suggested I use, I removed the other hashtable (multiplier hashtable).. the multiplier hashtable would trigger classloading of multipliers when ECOperations itself is loaded. By moving it into if statements, loading of multipliers is delayed until function call (i.e. loading of multipliers causes the table computation). That let me remove one layer of class nesting that existed before.
The intrinsics and the C2 changes look good to me.
Thanks Sandhya!
Now that I have @ascarpino approval as well, I plan to integrate next Tuesday.
I'll send this through our testing and will report back once it passed.
I'm getting some conflicts when trying to apply this to master. Could you please merge the PR?
Hi @TobiHartmann , merged with no issues for me. Could you please run the tests again? (I think Tony did run them, but can't hurt verifying again). Thanks!
Thanks! I submitted testing and will report back once it passed.
All tests passed.
Thanks Tobi!
/integrate
@vpaprotsk Your change (at version b1a3300440581c5ed42c9bac7cfb9a6ee373c6d7) is now ready to be sponsored by a Committer.
/sponsor
Going to push as commit afed7d0b0593864e5595840a6b645c210ff28c7c.
Since your change was applied there have been 8 commits pushed to the master
branch:
- 9ca90ccd6bfec76e54e2e870bd706fad5abf233c: 8332610: Remove unused nWakeups in ObjectMonitor
- 92d33501e091bdfaab52886078053b849a5a8f68: 8331920: ubsan: g1CardSetContainers.inline.hpp:266:5: runtime error: index 2 out of bounds for type 'G1CardSetHowl::ContainerPtr [2]' reported
- 4f1a10f84bcfadef263a0890b6834ccd3d5bb52f: 8332360: JVM hangs at exit when running on a uniprocessor
- c3bc23fe48ca1603afe68a6ac4aaa523a1edbb41: 8326306: RISC-V: Re-structure MASM calls and jumps
- 8a9d77d58de259b6b2bdc2cc9e7bfdc28dcf7165: 8320622: [TEST] Improve coverage of compiler/loopopts/superword/TestMulAddS2I.java on different platforms
- 3d511ff63e59f542ae20c722bfef1c867cd1da0e: 8329748: Change default value of AssertWXAtThreadSync to true
- 67f03f2a4f5ac12748ffbf5c04f248a60869e180: 8332533: RISC-V: Enable vector variable shift instructions for machines with RVV
- 5f804b2ec12627b593353ceeab881187b0bb5cd6: 8329825: Clarify the value type for java.net.SocketOptions.SO_LINGER
Your commit was automatically rebased without conflicts.
@sviswa7 @vpaprotsk Pushed as commit afed7d0b0593864e5595840a6b645c210ff28c7c.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.
Unfortunately, this caused a performance regression, see JDK-8333583. @vpaprotsk, please have a look.