jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8295010: Reduce if required in EC limbs operations

Open XueleiFan opened this issue 2 years ago • 7 comments

Hi,

May I have this update reviewed? With this update, the result will be reduced if required in EC limbs operations in the JDK implementation.

In the current implementation, the EC limbs addition and subtraction result is not reduced before the value is returned. This behavior is different from multiplication and square. To get it right, a maximum addition limit is introduced for EC limbs addition and subtraction. If the limit is reached, an exception will be thrown so as to indicate an EC implementation problem. The exception is not recoverable from the viewpoint of an application.

The design is error prone as the EC implementation code must be carefully checked so that the limit cannot reach. If a reach is noticed, a reduce operation would have to be hard-coded additionally. In the FieldGen.java implementation, the limit can only be 1 or 2 as the implementation code is only able to handle 2. But the FieldGen.java does not really check if 1 or 2 really works for the specific filed generation parameters.

The design impact the performance as well. Because of this limit, the maximum limb size is 28 bits for 2 max adding limit. Otherwise there are integer (long) overflow issues. For example for 256 bits curves, 10 limbs are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 limbs from 10 to 9, the Signature performance could get improved by 20%.

In the IntegerPolynomial class description, it is said "All IntegerPolynomial implementations allow at most one addition before multiplication. Additions after that will result in an ArithmeticException." It's too strict to follow without exam the code very carefully. Indeed, the implementation does not really follow the spec, and 2 addition may be allowed.

It would be nice if there is no addition limit, and then we are free from these issues. It is doable by reducing the limbs if required for EC limbs operations.

The benchmark for ECDSA Signature is checked in order to see how much the performance could be impacted. Here are the benchmark numbers before the patch applied:

Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign    secp256r1               64  thrpt   15  1417.507 ±  5.809  ops/s
Signatures.sign    secp256r1              512  thrpt   15  1412.620 ±  7.040  ops/s
Signatures.sign    secp256r1             2048  thrpt   15  1417.364 ±  6.643  ops/s
Signatures.sign    secp256r1            16384  thrpt   15  1364.720 ± 44.354  ops/s
Signatures.sign    secp384r1               64  thrpt   15   609.924 ±  9.858  ops/s
Signatures.sign    secp384r1              512  thrpt   15   610.873 ±  5.519  ops/s
Signatures.sign    secp384r1             2048  thrpt   15   608.573 ± 13.324  ops/s
Signatures.sign    secp384r1            16384  thrpt   15   597.802 ±  7.074  ops/s
Signatures.sign    secp521r1               64  thrpt   15   301.954 ±  6.449  ops/s
Signatures.sign    secp521r1              512  thrpt   15   300.774 ±  5.849  ops/s
Signatures.sign    secp521r1             2048  thrpt   15   295.831 ±  8.423  ops/s
Signatures.sign    secp521r1            16384  thrpt   15   276.258 ± 43.146  ops/s
Signatures.sign      Ed25519               64  thrpt   15  1171.878 ±  4.187  ops/s
Signatures.sign      Ed25519              512  thrpt   15  1175.175 ±  4.611  ops/s
Signatures.sign      Ed25519             2048  thrpt   15  1168.459 ±  5.750  ops/s
Signatures.sign      Ed25519            16384  thrpt   15  1086.906 ±  6.514  ops/s
Signatures.sign        Ed448               64  thrpt   15   326.475 ±  3.274  ops/s
Signatures.sign        Ed448              512  thrpt   15   328.709 ±  3.867  ops/s
Signatures.sign        Ed448             2048  thrpt   15   315.480 ± 15.377  ops/s
Signatures.sign        Ed448            16384  thrpt   15   312.388 ±  1.067  ops/s

and here are the numbers with this patch applied:

Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign    secp256r1               64  thrpt   15  1377.447 ± 38.889  ops/s
Signatures.sign    secp256r1              512  thrpt   15  1403.149 ±  5.151  ops/s
Signatures.sign    secp256r1             2048  thrpt   15  1401.101 ±  6.937  ops/s
Signatures.sign    secp256r1            16384  thrpt   15  1381.855 ± 10.606  ops/s
Signatures.sign    secp384r1               64  thrpt   15   595.979 ±  1.967  ops/s
Signatures.sign    secp384r1              512  thrpt   15   592.419 ±  6.087  ops/s
Signatures.sign    secp384r1             2048  thrpt   15   581.800 ± 18.815  ops/s
Signatures.sign    secp384r1            16384  thrpt   15   583.224 ±  9.856  ops/s
Signatures.sign    secp521r1               64  thrpt   15   264.772 ± 36.883  ops/s
Signatures.sign    secp521r1              512  thrpt   15   302.084 ±  1.074  ops/s
Signatures.sign    secp521r1             2048  thrpt   15   296.619 ±  3.272  ops/s
Signatures.sign    secp521r1            16384  thrpt   15   290.112 ±  5.085  ops/s
Signatures.sign      Ed25519               64  thrpt   15  1163.293 ±  4.266  ops/s
Signatures.sign      Ed25519              512  thrpt   15  1157.093 ±  8.145  ops/s
Signatures.sign      Ed25519             2048  thrpt   15  1140.900 ±  8.660  ops/s
Signatures.sign      Ed25519            16384  thrpt   15  1053.233 ± 40.591  ops/s
Signatures.sign        Ed448               64  thrpt   15   317.509 ±  6.870  ops/s
Signatures.sign        Ed448              512  thrpt   15   320.975 ±  1.534  ops/s
Signatures.sign        Ed448             2048  thrpt   15   322.157 ±  1.914  ops/s
Signatures.sign        Ed448            16384  thrpt   15   303.532 ±  6.419  ops/s

From the numbers, it looks like the performance impact is limited. Based on this update, the performance could get rewarded by using less limbs. For examples if using less limbs for secp256r1/secp384r1/secp521r1, see PR for PR for JDK-8294248, and run the benchmark together with this patch, I could see the performance improvement as follow (note: I did not update to use less limbs for Ed25519/Ed448 yet, the numbers listed below for them are just for comparing with the numbers above):

Benchmark        (curveName)  (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign    secp256r1               64  thrpt   15  1632.905 ± 12.235  ops/s
Signatures.sign    secp256r1              512  thrpt   15  1626.057 ±  3.052  ops/s
Signatures.sign    secp256r1             2048  thrpt   15  1625.676 ±  8.435  ops/s
Signatures.sign    secp256r1            16384  thrpt   15  1603.612 ±  4.289  ops/s
Signatures.sign    secp384r1               64  thrpt   15   605.713 ± 14.951  ops/s
Signatures.sign    secp384r1              512  thrpt   15   608.920 ±  7.715  ops/s
Signatures.sign    secp384r1             2048  thrpt   15   601.338 ±  1.638  ops/s
Signatures.sign    secp384r1            16384  thrpt   15   602.315 ±  3.308  ops/s
Signatures.sign    secp521r1               64  thrpt   15   320.946 ±  0.877  ops/s
Signatures.sign    secp521r1              512  thrpt   15   322.642 ±  1.222  ops/s
Signatures.sign    secp521r1             2048  thrpt   15   322.204 ±  1.929  ops/s
Signatures.sign    secp521r1            16384  thrpt   15   316.769 ±  5.398  ops/s
Signatures.sign      Ed25519               64  thrpt   15  1143.084 ± 13.169  ops/s
Signatures.sign      Ed25519              512  thrpt   15  1145.780 ±  7.880  ops/s
Signatures.sign      Ed25519             2048  thrpt   15  1144.818 ±  5.768  ops/s
Signatures.sign      Ed25519            16384  thrpt   15  1081.061 ±  8.283  ops/s
Signatures.sign        Ed448               64  thrpt   15   321.304 ±  0.638  ops/s
Signatures.sign        Ed448              512  thrpt   15   320.501 ±  1.924  ops/s
Signatures.sign        Ed448             2048  thrpt   15   319.234 ±  1.082  ops/s
Signatures.sign        Ed448            16384  thrpt   15   306.937 ±  1.060  ops/s

If considering this PR together with PR for JDK-8294248, performance improvement could be expected for EC crypto primitives.

Thanks, Xuelei


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8295010: Reduce if required in EC limbs operations

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10624/head:pull/10624
$ git checkout pull/10624

Update a local copy of the PR:
$ git checkout pull/10624
$ git pull https://git.openjdk.org/jdk pull/10624/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10624

View PR using the GUI difftool:
$ git pr show -t 10624

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10624.diff

XueleiFan avatar Oct 09 '22 06:10 XueleiFan

:wave: Welcome back xuelei! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 09 '22 06:10 bridgekeeper[bot]

@XueleiFan The following labels will be automatically applied to this pull request:

  • build
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 09 '22 06:10 openjdk[bot]

Webrevs

mlbridge[bot] avatar Oct 10 '22 05:10 mlbridge[bot]

That's a pretty significant performance degradation (8%?), and as far as I could tell, it will affect all EC and XEC implementations. On the other hand, #10398 only improves performance of P256. I'm not sure that's a tradeoff we want to make.

IMO working with setReduced is not that bad; we always invoke the same set of operations in the same sequence, so if we forget to reduce a number before multiplying, every EC operation will fail. Could we introduce just enough reduce / setReduced calls to make P256 work reliably with 9 limbs? What impact would it have on P384?

djelinski avatar Oct 10 '22 12:10 djelinski

Thank you for looking into the PR.

On the other hand, #10398 only improves performance of P256. I'm not sure that's a tradeoff we want to make.

If I get it right, all supported curves could be improved by 1 limb in implementation.

Could we introduce just enough reduce / setReduced calls to make P256 work reliably with 9 limbs?

Yes, it is doable as well by limit the maxAdd to 1 in the implementation. This approach needs to adjust the code accordingly, just as what we did to make maxAdd 2 works. The performance impact is not that bad (about 5%). This approach has better performance (about 15%~20%) if reducing the limbs to 9 for P256.

IMO working with setReduced is not that bad;

Hm, I may be focus too much on the error-prone design. Let me check what it could be by limit the maxAdd to 1 always.

XueleiFan avatar Oct 10 '22 14:10 XueleiFan

Close it for now. I will open it again when I have a solution for better performance.

XueleiFan avatar Oct 10 '22 22:10 XueleiFan

@XueleiFan this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8295010
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk[bot] avatar Oct 13 '22 16:10 openjdk[bot]

/label -build

magicus avatar Oct 19 '22 09:10 magicus

@magicus The build label was successfully removed.

openjdk[bot] avatar Oct 19 '22 09:10 openjdk[bot]

Anyone has cycle to have a review?

XueleiFan avatar Nov 07 '22 18:11 XueleiFan

The way I see it is this: as limbs are 64-bit wide, the only place where they can possibly overflow (during the computations they are used for) is the multiplication (including multiply by int and squaring). So I would first try to change the mult() and square() methods only in IntegerPolynomialP256.java (well, in the generator that creates it).

I agreed. The addition and subtraction operations are limited as well. Each limb cannot exceed 32 bits, otherwise the carry/reduce may not work as far as I can see. It would be good the addition and subtraction was placed in IntegerPolynomialP256.java as well. Otherwise, we have to check the the limits in the caller level, which is error-prone.

I will think about how to make this suggestion right. It may be another PR for the restructure.

(It would also be nice to add comments to the various carry/reduce methods that explain what exactly they want to achieve -- although this is definitely something that doesn't have to be in this change set.)

Yes, there are not much comment in the current code and hard to get the ideas. I will see if I can add more in another PR.

I also think (agree with you) that the setReduced() method can be eliminated if you reduce the multiplicands conditionally (if numAdds > maxAdds) before the multiplication/squaring and unconditionally after it (this part is done in the generated functions already). But that assumes you change all subclasses of IntegerPolynomial that way (most conveniently in the setProduct,Square methods).

The plan is to change all curves (curve448 may be an exception, but I still investigate on it. I can see performance improvement for other curves).

XueleiFan avatar Nov 09 '22 16:11 XueleiFan

The way I see it is this: as limbs are 64-bit wide, the only place where they can possibly overflow (during the computations they are used for) is the multiplication (including multiply by int and squaring). So I would first try to change the mult() and square() methods only in IntegerPolynomialP256.java (well, in the generator that creates it).

I agreed. The addition and subtraction operations are limited as well. Each limb cannot exceed 32 bits, otherwise the carry/reduce may not work as far as I can see. It would be good the addition and subtraction was placed in IntegerPolynomialP256.java as well. Otherwise, we have to check the the limits in the caller level, which is error-prone.

I will think about how to make this suggestion right. It may be another PR for the restructure.

I had a further look at the current code structure. The addition counter status is maintained in the caller level (IntegerPolynomial.java). It is possible to re-org the code (the generator), but we may not like the scale of the update. It may not be the purpose of this update. I would like to defer to a separated PR later and keep this PR focus on performance improvement.

XueleiFan avatar Nov 09 '22 19:11 XueleiFan

@djelinski The update has been significantly differently from the first patch. Did you have a cycle to have another look?

XueleiFan avatar Nov 21 '22 07:11 XueleiFan

@ferakocz Did you have further comments or concerns? Please let me know if I'm on the right direction for the performance improvement. Thanks!

XueleiFan avatar Nov 21 '22 07:11 XueleiFan

Can you share the updated benchmarks?

djelinski avatar Nov 21 '22 07:11 djelinski

Can you share the updated benchmarks?

The benchmark number in the PR description is the latest run that I have. I may run it again after the integration of multiplicative inversion and point multiplication improvement.

XueleiFan avatar Nov 21 '22 08:11 XueleiFan

Now that reduce is called as needed, how do we guarantee that ECOperations.multiply will remain constant-time, i.e. call reduce a fixed number of times regardless of the input?

djelinski avatar Nov 22 '22 08:11 djelinski

Now that reduce is called as needed, how do we guarantee that ECOperations.multiply will remain constant-time, i.e. call reduce a fixed number of times regardless of the input?

As the reducing operation is depends on the numbers of adds (not the limbs bits), the operation is depends on the formulas, rather than the sensitive information. For a EC curve, the use of reducing operation is determined, just as what it was used in the ECOperations. For example, if a curve for sum operation needs to call reduce at the 3rd product and 5th addition, the operations will always be called at that steps. If I am right, the reduce operation is still called a fixed number of times.

XueleiFan avatar Nov 22 '22 16:11 XueleiFan

I may run it again after the integration of multiplicative inversion and point multiplication improvement.

After the integration of the improvement above, here is the benchmark numbers with this patch:

Benchmark                    (algorithm)  (messageLength)   Mode  Cnt     Score     Error  Units
Signatures.EdDSA.sign            Ed25519               64  thrpt   15  1084.556 ± 135.637  ops/s
Signatures.EdDSA.sign            Ed25519              512  thrpt   15  1168.663 ±  25.072  ops/s
Signatures.EdDSA.sign            Ed25519             2048  thrpt   15  1186.863 ±  16.224  ops/s
Signatures.EdDSA.sign            Ed25519            16384  thrpt   15  1095.034 ±   6.462  ops/s
Signatures.EdDSA.sign              Ed448               64  thrpt   15   323.771 ±   2.156  ops/s
Signatures.EdDSA.sign              Ed448              512  thrpt   15   326.995 ±   2.101  ops/s
Signatures.EdDSA.sign              Ed448             2048  thrpt   15   320.799 ±   5.452  ops/s
Signatures.EdDSA.sign              Ed448            16384  thrpt   15   317.715 ±   2.554  ops/s
Signatures.sign                secp256r1               64  thrpt   15  4072.636 ±  22.441  ops/s
Signatures.sign                secp256r1              512  thrpt   15  4048.822 ±  40.769  ops/s
Signatures.sign                secp256r1             2048  thrpt   15  4042.884 ±  20.147  ops/s
Signatures.sign                secp256r1            16384  thrpt   15  3911.856 ±  12.039  ops/s
Signatures.sign                secp384r1               64  thrpt   15   634.203 ±   4.532  ops/s
Signatures.sign                secp384r1              512  thrpt   15   637.623 ±   1.592  ops/s
Signatures.sign                secp384r1             2048  thrpt   15   620.283 ±  10.014  ops/s
Signatures.sign                secp384r1            16384  thrpt   15   622.617 ±   5.695  ops/s
Signatures.sign                secp521r1               64  thrpt   15   311.957 ±   5.420  ops/s
Signatures.sign                secp521r1              512  thrpt   15   316.605 ±   2.204  ops/s
Signatures.sign                secp521r1             2048  thrpt   15   316.700 ±   1.654  ops/s
Signatures.sign                secp521r1            16384  thrpt   15   309.599 ±   7.167  ops/s

and the numbers without this patch:

Benchmark                    (algorithm)  (messageLength)   Mode  Cnt     Score     Error  Units
Signatures.EdDSA.sign            Ed25519               64  thrpt   15  1138.578 ±  57.908  ops/s
Signatures.EdDSA.sign            Ed25519              512  thrpt   15  1172.242 ±  17.180  ops/s
Signatures.EdDSA.sign            Ed25519             2048  thrpt   15  1163.793 ±  21.095  ops/s
Signatures.EdDSA.sign            Ed25519            16384  thrpt   15  1093.856 ±   5.964  ops/s
Signatures.EdDSA.sign              Ed448               64  thrpt   15   324.089 ±   2.894  ops/s
Signatures.EdDSA.sign              Ed448              512  thrpt   15   323.580 ±   1.437  ops/s
Signatures.EdDSA.sign              Ed448             2048  thrpt   15   323.680 ±   2.555  ops/s
Signatures.EdDSA.sign              Ed448            16384  thrpt   15   310.641 ±   2.256  ops/s
Signatures.sign                secp256r1               64  thrpt   15  4070.733 ±  27.059  ops/s
Signatures.sign                secp256r1              512  thrpt   15  4061.835 ±  18.776  ops/s
Signatures.sign                secp256r1             2048  thrpt   15  4041.226 ±  19.082  ops/s
Signatures.sign                secp256r1            16384  thrpt   15  3893.323 ±  11.869  ops/s
Signatures.sign                secp384r1               64  thrpt   15   632.924 ±   8.273  ops/s
Signatures.sign                secp384r1              512  thrpt   15   628.807 ±   7.604  ops/s
Signatures.sign                secp384r1             2048  thrpt   15   631.052 ±   1.782  ops/s
Signatures.sign                secp384r1            16384  thrpt   15   530.402 ± 122.967  ops/s
Signatures.sign                secp521r1               64  thrpt   15   316.634 ±   1.724  ops/s
Signatures.sign                secp521r1              512  thrpt   15   315.830 ±   2.333  ops/s
Signatures.sign                secp521r1             2048  thrpt   15   315.855 ±   1.093  ops/s
Signatures.sign                secp521r1            16384  thrpt   15   315.019 ±   1.124  ops/s

XueleiFan avatar Nov 23 '22 08:11 XueleiFan

Well my concern was that when we call setSum(ProjectivePoint, ProjectivePoint), the number of reduce calls depends on the numAdds values on all coordinates of both input parameters. If you take a look at the default multiplier's pointMultiples, you'll notice that numAdds is 0 on all coordinates of the first two pointMultiples and 1 on all coordinates of the remaining pointMultiples.

Fortunately lookup (specifically IntegerPolynomial.conditionalSet) always overwrites numAdds, so the returned point's numAdds are always copied from the last point in the lookup table. As long as the second parameter of setSum always comes from lookup (and in your patch it does), we should be fine.

djelinski avatar Nov 23 '22 11:11 djelinski

@XueleiFan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8295010: Reduce if required in EC limbs operations

Reviewed-by: djelinski, jjiang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

  • 2f83b5c487f112c175d081ca5882f5032518937a: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names
  • 50f9043c6965360c426b187e47c49c42481a2549: 8297451: ProcessHandleImpl should assert privilege when modifying reaper thread
  • 99d3840d368f1d99af72250678a2cb0c55ee0957: 8297359: RISC-V: improve performance of floating Max Min intrinsics
  • 6c05771b9be3dd5cbcdb40d2e53cc53959926cdd: 8295447: NullPointerException with invalid pattern matching construct in constructor call
  • 76a24c3f90d8e0655bfcaa3dd5c2d1f74515ebc6: 8297145: Add a @sealedGraph tag to ConstantDesc
  • 099b42f360a0e693a63f009e3e044307aab5c689: 8297148: Add a @sealedGraph tag to CallSite
  • 85ddd8f2af51fa5ea7f63027285509afb9a5c439: 8295253: Remove kludge from v1_0/PerfDataBuffer.java
  • 952e10055135613e8ea2b818a4f35842936f5633: 8297431: [JVMCI] HotSpotJVMCIRuntime.encodeThrowable should not throw an exception
  • 08e6a820bcb70e74a0faa28198493292e2993901: 8297590: [TESTBUG] HotSpotResolvedJavaFieldTest does not run
  • 4f65570204e2d38415e7761bd81660b081eae882: 8294583: JShell: NPE in switch with non existing record pattern
  • ... and 35 more: https://git.openjdk.org/jdk/compare/b4bd287f736b6b5dcfe1b183cae9b11eb6f22686...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Nov 28 '22 01:11 openjdk[bot]

/integrate

XueleiFan avatar Nov 29 '22 17:11 XueleiFan

Going to push as commit b778cd52b3fae013ecb21d90bcf940a4d947bd68. Since your change was applied there have been 78 commits pushed to the master branch:

  • 54e6d6aaeb5dec2dc1b9fb3ac9b34c8621df506d: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension
  • 69ede5baeda6645aa3e961a02cbd40db965fc6a1: 8293177: Verify version numbers in legal files
  • cd6bebbf34215723fad1d6bfe070a409351920c1: 8247645: ChaCha20 intrinsics
  • 33587ffd35c568c1ef034f064e6f3f06fe9943c3: 8292625: jshell crash on "var a = a"
  • 2deb318c9f047ec5a4b160d66a4b52f93688ec42: 8297065: DerOutputStream operations should not throw IOExceptions
  • d83a07b72cfd4dc42c5d4815262fcba05c653bd5: 8297200: java/net/httpclient/SpecialHeadersTest.java failed once in AssertionError due to selector thread remaining alive
  • 5d2772a43ef6409bf556cefb4eb4242594451674: 8297424: java/net/httpclient/AsyncExecutorShutdown.java fails in AssertionError due to misplaced assert
  • 361b50e724f8c1177f89eaa93e38b69d244dadee: 8292594: Use CSS custom properties for all fonts and colors
  • 42b60ed22c02663eb1377d1ce78a559fdbb4348d: 8297030: Reduce Default Keep-Alive Timeout Value for httpclient
  • 1301fb0b5f998c9cf8bcd8a53e6a90d6ab5a7da9: 8296470: Refactor VMError::report STEP macro to improve readability
  • ... and 68 more: https://git.openjdk.org/jdk/compare/b4bd287f736b6b5dcfe1b183cae9b11eb6f22686...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Nov 29 '22 17:11 openjdk[bot]

@XueleiFan Pushed as commit b778cd52b3fae013ecb21d90bcf940a4d947bd68.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Nov 29 '22 17:11 openjdk[bot]

@XueleiFan, I have some questions about this integration. The performance numbers you last posted showed some very small improvements but also some regressions in throughput numbers, so its not clear to me if this is worth integrating yet. Earlier, you said that the performance benefits of this fix would not be realized until the changes for https://github.com/openjdk/jdk/pull/10398 were to be integrated, but that change is still in draft and has comments that have not been resolved. So, is it possible this was integrated too early?

seanjmullan avatar Nov 29 '22 18:11 seanjmullan

@seanjmullan if you check the benchmark numbers, the throughput impact is limited. For some curves, it is improved; and some others is impacted. In theory of this update, it is not expected to have performance impact if no improvement. The throughput impact in the benchmark is more from the noice in the benchmark platform, I think.

The fix for 10398 depends on this update. Only after integrating this one, I can open 10398 for review. The code is ready for 10398 , I'm running the benchmarking. Once the benchmark numbers are ready, I will open it for review. It should be ready to review today.

XueleiFan avatar Nov 29 '22 18:11 XueleiFan

@seanjmullan if you check the benchmark numbers, the throughput impact is limited. For some curves, it is improved; and some others is impacted. In theory of this update, it is not expected to have performance impact if no improvement. The throughput impact in the benchmark is more from the noice in the benchmark platform, I think.

The fix for 10398 depends on this update. Only after integrating this one, I can open 10398 for review. The code is ready for 10398 , I'm running the benchmarking. Once the benchmark numbers are ready, I will open it for review. It should be ready to review today.

Ok, thanks for the update. Will keep an eye out for the updated benchmark numbers.

seanjmullan avatar Nov 29 '22 19:11 seanjmullan