jdk
jdk copied to clipboard
8284493: Fix rounding error in computeNextExponential
This PR improves both the performance of nextExponential and nextGaussian and the distribution of output at the tails. It fixes the following imperfections:
- Repeatedly adding DoubleZigguratTables.exponentialX0 to extra causes a rounding error to accumulate at the tail of the distribution (probably starting around
2*exponentialX0 == 0x1.e46eff20739afp3 ~ 15.1); this PR fixes that by tracking the multiple of exponentialX0 as a long. (This distortion is worst whenx > 0x1.0p56since in that case, a rounding error meansextra + x == extra. - Reduces several equations using
Math.fma. (This will almost certainly improve performance, and may or may not improve output distribution.) - Uses the newly-extracted
computeWinsorizedNextExponentialfunction to greatly reduce the probability thatnextGaussiansuffers from two tail cases ofnextExponential.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8284493: Fix rounding error in computeNextExponential
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8131/head:pull/8131
$ git checkout pull/8131
Update a local copy of the PR:
$ git checkout pull/8131
$ git pull https://git.openjdk.org/jdk pull/8131/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8131
View PR using the GUI difftool:
$ git pr show -t 8131
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8131.diff
Hi @Pr0methean, welcome to this OpenJDK project and thanks for contributing!
We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.
If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user Pr0methean" as summary for the issue.
If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.
@Pr0methean The following label will be automatically applied to this pull request:
core-libs
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
/covered @Pr0methean is an employee of Amazon.
/covered
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!
hi, @JesperIRL, Could you help us on OCA coverage?
Should we extract a computeWinsorizedNextExponential that can compute a smaller value for line 1216 to compare the output to, so that line 1218 can realistically be covered in a unit test? Such a method might even be worth exposing in the RandomGenerator interface, in case an approximately exponential distribution is ever needed in a hard real-time system.
/covered
@robilad, can you please verify Chris Hennick's (@Pr0methean) OCA status? He's an Amazon employee and covered by Amazons OCA.
Webrevs
- 21: Full (d5ade1c7)
- 20: Full (1b5656d7)
- 19: Full - Incremental (e4714ec9)
- 18: Full (2470c00a)
- 17: Full (63706c8b)
- 16: Full - Incremental (cc7c425a)
- 15: Full - Incremental (b75139c9)
- 14: Full - Incremental (c94a09e5)
- 13: Full - Incremental (d1c9bafd)
- 12: Full - Incremental (e3f55542)
- 11: Full - Incremental (5919faeb)
- 10: Full - Incremental (cab6b5bb)
- 09: Full - Incremental (db9ff01d)
- 08: Full - Incremental (f74b6872)
- 07: Full - Incremental (d903efe5)
- 06: Full - Incremental (f84e0e03)
- 05: Full - Incremental (13c6d22c)
- 04: Full - Incremental (2c873856)
- 03: Full - Incremental (1daaadac)
- 02: Full - Incremental (c5a28f98)
- 01: Full - Incremental (fa340fb4)
- 00: Full (1fd959cc)
In addition to the changes discussed heretofore, I've also changed line 1382 to eliminate unneeded tail exploration; this should make nextGaussian faster at high percentiles (probably measurable at 99%ile; should definitely be measurable at at 99.99%ile).
@Pr0methean Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
@Pr0methean Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
@Pr0methean Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
@Pr0methean Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
@Pr0methean This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep open; I'll submit some benchmarks when I have a chance.
@Pr0methean This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep open
@Pr0methean This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep open
Update: just waiting for approval from work to obtain c6a.metal, c6i.metal and c6g.metal instances for long enough to run the benchmarks before and after.
@Pr0methean Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
@Pr0methean This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep open
@Pr0methean This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep open. Testing is in progress right now.
JMH benchmark results (make test TEST="micro:java.util.random") with fixedSeed = true on EC2 are below. Average time improves only for nextExponential() and only on non-Intel CPUs, but p100 improved more consistently (7 of the 12 tests improved by at least 1000ns, only 2 deteriorated by that amount.)
c6i.metal
L64X128MixRandom.nextExponential() average ns/op: 26.984 ± 0.015 before, 34.914 ± 0.021 after L64X128MixRandom.nextExponential() p100 ns/op: 8224 before, 5624 after L64X1024MixRandom.nextExponential() average ns/op: 30.026 ± 0.020 before, 36.730 ± 0.024 after L64X1024MixRandom.nextExponential() p100 ns/op: 8736 before, 7464 after L64X128MixRandom.nextGaussian() average ns/op: 29.995 ± 0.017 before, 30.016 ± 0.019 after L64X128MixRandom.nextGaussian() p100 ns/op: 5392 before, 3024 after L64X1024MixRandom.nextGaussian() average ns/op: 31.394 ± 0.020 before, 31.660 ± 0.021 after L64X1024MixRandom.nextGaussian() p100 ns/op: 5984 before, 6072 after
c7g.16xlarge
L64X128MixRandom.nextExponential() average ns/op: 41.006 ± 0.056 before, 40.288 ± 0.055 after L64X128MixRandom.nextExponential() p100 ns/op: 6384 before, 6416 after L64X1024MixRandom.nextExponential() average ns/op: 45.218 ± 0.071 before, 40.549 ± 0.062 after L64X1024MixRandom.nextExponential() p100 ns/op: 560 before, 3760 after L64X128MixRandom.nextGaussian() average ns/op: 39.643 ± 0.052 before, 40.473 ± 0.070 after L64X128MixRandom.nextGaussian() p100 ns/op: 7728 before, 1264 after L64X1024MixRandom.nextGaussian() average ns/op: 40.393 ± 0.061 before, 40.274 ± 0.061 after L64X1024MixRandom.nextGaussian() p100 ns/op: 9184 before, 1280 after
m6a.metal
L64X128MixRandom.nextExponential() average ns/op: 31.241 ± 0.039 before, 29.004 ± 0.018 after L64X128MixRandom.nextExponential() p100 ns/op: 4224 before, 7224 after L64X1024MixRandom.nextExponential() average ns/op: 31.903 ± 0.017 before, 31.146 | ± | 0.021 | ns/op | L64X1024MixRandom.nextExponential() p100 ns/op: 9744 before, 8688 after L64X128MixRandom.nextGaussian() average ns/op: 29.164 ± 0.017 before, 29.073 ± 0.021 after L64X128MixRandom.nextGaussian() p100 ns/op: 5920 before, 1504 after L64X1024MixRandom.nextGaussian() average ns/op: 29.503 ± 0.022 before, 29.639 ± 0.018 after L64X1024MixRandom.nextGaussian() p100 ns/op: 4256 before, 4288 after
@simonis @navyxliu @turbanoff @rgiulietti This is ready for review with the benchmark results.
hi, @Pr0methean
I notice that there is a big regression in L64X1024MixRandom.nextExponential p100 on c7g.16xlarge
L64X1024MixRandom.nextExponential() p100 ns/op: 560 before, 3760 after
but the average ns/op improves. Can you explain the discrepancy?
btw: I updated the JBS issue to ensure the subject is aligned with your PR.
Thanks, --lx
It's probably just a statistical fluke involving a very large return value, but to know for sure I'd need a way to gather statistics about the return values and how they correlate with running-time outliers.