jdk
jdk copied to clipboard
8292289: [vectorapi] Improve the implementation of VectorTestNode
This patch modifies the node generation of VectorSupport::test
to emit a CMoveINode
, which is picked up by BoolNode::Ideal(PhaseGVN*, bool)
to connect the VectorTestNode
directly to the BoolNode
, removing the redundant operations of materialising the test result in a GP register and do a CmpI
to get back the flags. As a result, VectorMask<T>::alltrue
is compiled into machine codes:
vptest xmm0, xmm1
jb if_true
if_false:
instead of:
vptest xmm0, xmm1
setb r10
movzbl r10
testl r10
jne if_true
if_false:
The results of jdk.incubator.vector.ArrayMismatchBenchmark
shows noticeable improvements:
Before After
Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change
ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ± 8316.444 222279.381 ± 2660.983 ops/ms +2.3%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ± 1618.836 116268.691 ± 1291.899 ops/ms +2.1%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ± 72.862 797.806 ± 16.429 ops/ms +13.6%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ± 2401.258 145338.910 ± 687.453 ops/ms -0.5%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ± 1259.397 69041.519 ± 1073.156 ops/ms +13.9%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ± 10.975 408.770 ± 5.281 ops/ms +29.0%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ± 1200.166 188482.433 ± 1872.076 ops/ms -3.7%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ± 473.013 42293.411 ± 2838.255 ops/ms -4.7%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ± 5.410 67.628 ± 3.241 ops/ms -0.8%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ± 1677.607 111060.400 ± 982.230 ops/ms +3.1%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ± 1002.599 21440.506 ± 1618.266 ops/ms +28.4%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ± 0.548 33.202 ± 2.365 ops/ms +0.7%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ± 3154.842 379944.254 ± 5703.134 ops/ms +13.3%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ± 786.312 56721.368 ± 2497.052 ops/ms -3.0%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ± 11.415 139.537 ± 4.667 ops/ms +4.9%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ± 2281.349 112409.365 ± 2110.055 ops/ms -4.7%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ± 795.619 33756.613 ± 826.533 ops/ms +24.7%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ± 8.927 66.951 ± 4.381 ops/ms +16.3%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ± 1042.497 182438.405 ± 2120.832 ops/ms -0.3%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ± 614.821 35397.398 ± 1609.235 ops/ms -3.5%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ± 2.142 65.427 ± 2.270 ops/ms -1.5%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ± 497.853 115165.845 ± 5381.674 ops/ms +4.3%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ± 661.350 19871.096 ± 201.464 ops/ms +35.0%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ± 0.821 31.933 ± 1.352 ops/ms +3.8%
I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much.
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [x] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)
Issue
- JDK-8292289: [vectorapi] Improve the implementation of VectorTestNode
Reviewers
- Xiaohong Gong (@XiaohongGong - Committer) ⚠️ Review applies to 14b71c00
- Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 4db8b6e5
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855
$ git checkout pull/9855
Update a local copy of the PR:
$ git checkout pull/9855
$ git pull https://git.openjdk.org/jdk pull/9855/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9855
View PR using the GUI difftool:
$ git pr show -t 9855
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9855.diff
:wave: Welcome back merykitty! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@merykitty The following label will be automatically applied to this pull request:
-
hotspot
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
/label hotspot-compiler
@merykitty
The hotspot-compiler
label was successfully added.
Webrevs
- 13: Full - Incremental (8d9ebed9)
- 12: Full (1fec3d30)
- 11: Full (05c1b9f5)
- 10: Full - Incremental (3480504c)
- 09: Full (de28e400)
- 08: Full (4db8b6e5)
- 07: Full (0894474a)
- 06: Full - Incremental (e5a81c41)
- 05: Full (c188a518)
- 04: Full - Incremental (5444c51d)
- 03: Full - Incremental (14b71c00)
- 02: Full - Incremental (c3d18cc9)
- 01: Full (cac9948e)
- 00: Full (9667aa00)
Please rebase this PR and resolve the conflicts. Then I can have an internal tests for the aarch64 change parts. Thanks a lot!
Thanks for your comments, I have redone the patch on top of the new merged aarch file.
Tests pass on aarch64 NEON and SVE. So aarch64 part looks good to me. Thanks!
I have resolved your comments, thanks.
@vnkozlov I have merged the patch with latest JDK sources.
@vnkozlov I have merged the patch with latest JDK sources.
@merykitty I submitted testing.
@merykitty This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8292289: [vectorapi] Improve the implementation of VectorTestNode
Reviewed-by: xgong, kvn
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 220 new commits pushed to the master
branch:
- 1166c8e2c0047869cd50b7ddc5355290ac2a695a: 8296896: Change virtual Thread.yield to use external submit
- 51759650e5593f48ce616a1a8abf51f5f8fd5302: 8298323: trivial typo in JOptionPane.OK_OPTION
- d5cf18e7fb591185eecb042bfa015609ea7d15e0: 8296198: JFileChooser throws InternalError java.lang.InternalError with Windows shortcuts
- 74f346b33f7fa053ad5c99ef85baa32b7fb12fa6: 8298075: RISC-V: Implement post-call NOPs
- 3aa4070d4ca21b9e90388995efbcde318892e25f: 8294047: HttpResponseInputStream swallows interrupts
- af8fb7eef7188ef762399cfb3faf5c8afd49efa7: 8282578: AIOOBE in javax.sound.sampled.Clip
- 8b69a2e434ad2fa3369079622b57afb973d5bd9a: 8298099: [JVMCI] decouple libgraal from JVMCI module at runtime
- 8a9911ef1762ae837e427ec9d91b1399ba33b6e4: 8295803: Console should be usable in jshell and other environments
- 5d4c71c8bd361af78c90777f17b79e95d8eb5afe: 8281236: (D)TLS key exchange named groups
- 10356e767a44632c5de142d4666bd85d4618bf71: 8298303: (fs) temporarily remove Path.getExtension
- ... and 210 more: https://git.openjdk.org/jdk/compare/2f83b5c487f112c175d081ca5882f5032518937a...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master
branch, type /integrate in a new comment.
/reviewers 2
@vnkozlov The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).
Similar to what has been discussed in #10192 , a mask of 512 bits is not matched into an XMMRegister, I have removed the untaken code paths.
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
May I have another review for this PR, please? Thank you very much.
@shqking Thanks a lot for your reviews, I have addressed those in the commits
Rerunning DaCapo and Renaissance which shows some variations.
@vnkozlov I believe VectorTestNode
is only used in Vector API, which should not affect those benchmarks?
@vnkozlov I believe
VectorTestNode
is only used in Vector API, which should not affect those benchmarks?
Yes, but you have changes in general code related to Cmp
node.
I see that DaCapo don't show regression after rerun. Renaissance is still running and numbers are "all over places" - it is not stable.
Anyway, I am approving this changes based on data I got.
@vnkozlov Ah I get it, thanks a lot for your reviews, can I merge the patch now?
Yes, you can integrate.
Thank everyone for your kind reviews and suggestions. /integrate
Going to push as commit 3dfadeebd023efb03a400f2b2656567a4154421a.
Since your change was applied there have been 238 commits pushed to the master
branch:
- d35e840024b80f9f686fb5522dc03b2c9233a6d3: 8297295: Remove ThreadGroup.allowThreadSuspension
- 175e3d3ff332be25cca9822c58c46f1e012953c2: 8296149: Start of release updates for JDK 21
- d562d3fcbe22a0443037c5b447e1a41401275814: 8297642: PhaseIdealLoop::only_has_infinite_loops must detect all loops that never lead to termination
- fc52f21f9a30c5c34caa06f8524c8d5bd74f16f7: 8298255: JFR provide information about dynamization of number of compiler threads
- e555d5470536b8379179879ec7343e004be95e36: 8298383: JFR: GenerateJfrFiles.java lacks copyright header
- c084431fae8c9f9b5a157cdaca484f63cbd6691a: 8298379: JFR: Some UNTIMED events only sets endTime
- ea108f504ccb63fc9651e804e3bbba1c108dcead: 8298129: Let checkpoint event sizes grow beyond u4 limit
- 165dcdd27de16824478ac9ebdfbd7b00fffe51e6: 8297718: Make NMT free:ing protocol more granular
- fbe7b007383b034589e93d398706bebeb24461ee: 8298173: GarbageCollectionNotificationContentTest test failed: no decrease in Eden usage
- d8ef60b406a9e8fe6cc6b7be0b74e45de38604c5: 8298272: Clean up ProblemList
- ... and 228 more: https://git.openjdk.org/jdk/compare/2f83b5c487f112c175d081ca5882f5032518937a...master
Your commit was automatically rebased without conflicts.
@merykitty Pushed as commit 3dfadeebd023efb03a400f2b2656567a4154421a.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.