jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8292289: [vectorapi] Improve the implementation of VectorTestNode

Open merykitty opened this issue 2 years ago • 15 comments

This patch modifies the node generation of VectorSupport::test to emit a CMoveINode, which is picked up by BoolNode::Ideal(PhaseGVN*, bool) to connect the VectorTestNode directly to the BoolNode, removing the redundant operations of materialising the test result in a GP register and do a CmpI to get back the flags. As a result, VectorMask<T>::alltrue is compiled into machine codes:

    vptest xmm0, xmm1
    jb if_true
if_false:

instead of:

    vptest xmm0, xmm1
    setb r10
    movzbl r10
    testl r10
    jne if_true
if_false:

The results of jdk.incubator.vector.ArrayMismatchBenchmark shows noticeable improvements:

                                                                                   Before                  After
Benchmark                                      Prefix    Size   Mode  Cnt       Score      Error       Score     Error    Units  Change
ArrayMismatchBenchmark.mismatchVectorByte         0.5       9  thrpt   10  217345.383 ± 8316.444  222279.381 ± 2660.983  ops/ms   +2.3%
ArrayMismatchBenchmark.mismatchVectorByte         0.5     257  thrpt   10  113918.406 ± 1618.836  116268.691 ± 1291.899  ops/ms   +2.1%
ArrayMismatchBenchmark.mismatchVectorByte         0.5  100000  thrpt   10     702.066 ±   72.862     797.806 ±   16.429  ops/ms  +13.6%
ArrayMismatchBenchmark.mismatchVectorByte         1.0       9  thrpt   10  146096.564 ± 2401.258  145338.910 ±  687.453  ops/ms   -0.5%
ArrayMismatchBenchmark.mismatchVectorByte         1.0     257  thrpt   10   60598.181 ± 1259.397   69041.519 ± 1073.156  ops/ms  +13.9%
ArrayMismatchBenchmark.mismatchVectorByte         1.0  100000  thrpt   10     316.814 ±   10.975     408.770 ±    5.281  ops/ms  +29.0%
ArrayMismatchBenchmark.mismatchVectorDouble       0.5       9  thrpt   10  195674.549 ± 1200.166  188482.433 ± 1872.076  ops/ms   -3.7%
ArrayMismatchBenchmark.mismatchVectorDouble       0.5     257  thrpt   10   44357.169 ±  473.013   42293.411 ± 2838.255  ops/ms   -4.7%
ArrayMismatchBenchmark.mismatchVectorDouble       0.5  100000  thrpt   10      68.199 ±    5.410      67.628 ±    3.241  ops/ms   -0.8%
ArrayMismatchBenchmark.mismatchVectorDouble       1.0       9  thrpt   10  107722.450 ± 1677.607  111060.400 ±  982.230  ops/ms   +3.1%
ArrayMismatchBenchmark.mismatchVectorDouble       1.0     257  thrpt   10   16692.645 ± 1002.599   21440.506 ± 1618.266  ops/ms  +28.4%
ArrayMismatchBenchmark.mismatchVectorDouble       1.0  100000  thrpt   10      32.984 ±    0.548      33.202 ±    2.365  ops/ms   +0.7%
ArrayMismatchBenchmark.mismatchVectorInt          0.5       9  thrpt   10  335458.217 ± 3154.842  379944.254 ± 5703.134  ops/ms  +13.3%
ArrayMismatchBenchmark.mismatchVectorInt          0.5     257  thrpt   10   58505.302 ±  786.312   56721.368 ± 2497.052  ops/ms   -3.0%
ArrayMismatchBenchmark.mismatchVectorInt          0.5  100000  thrpt   10     133.037 ±   11.415     139.537 ±    4.667  ops/ms   +4.9%
ArrayMismatchBenchmark.mismatchVectorInt          1.0       9  thrpt   10  117943.802 ± 2281.349  112409.365 ± 2110.055  ops/ms   -4.7%
ArrayMismatchBenchmark.mismatchVectorInt          1.0     257  thrpt   10   27060.015 ±  795.619   33756.613 ±  826.533  ops/ms  +24.7%
ArrayMismatchBenchmark.mismatchVectorInt          1.0  100000  thrpt   10      57.558 ±    8.927      66.951 ±    4.381  ops/ms  +16.3%
ArrayMismatchBenchmark.mismatchVectorLong         0.5       9  thrpt   10  182963.715 ± 1042.497  182438.405 ± 2120.832  ops/ms   -0.3%
ArrayMismatchBenchmark.mismatchVectorLong         0.5     257  thrpt   10   36672.215 ±  614.821   35397.398 ± 1609.235  ops/ms   -3.5%
ArrayMismatchBenchmark.mismatchVectorLong         0.5  100000  thrpt   10      66.438 ±    2.142      65.427 ±    2.270  ops/ms   -1.5%
ArrayMismatchBenchmark.mismatchVectorLong         1.0       9  thrpt   10  110393.047 ±  497.853  115165.845 ± 5381.674  ops/ms   +4.3%
ArrayMismatchBenchmark.mismatchVectorLong         1.0     257  thrpt   10   14720.765 ±  661.350   19871.096 ±  201.464  ops/ms  +35.0%
ArrayMismatchBenchmark.mismatchVectorLong         1.0  100000  thrpt   10      30.760 ±    0.821      31.933 ±    1.352  ops/ms   +3.8%

I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much.


Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [x] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8292289: [vectorapi] Improve the implementation of VectorTestNode

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855
$ git checkout pull/9855

Update a local copy of the PR:
$ git checkout pull/9855
$ git pull https://git.openjdk.org/jdk pull/9855/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9855

View PR using the GUI difftool:
$ git pr show -t 9855

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9855.diff

merykitty avatar Aug 12 '22 13:08 merykitty

:wave: Welcome back merykitty! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Aug 12 '22 13:08 bridgekeeper[bot]

@merykitty The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Aug 12 '22 13:08 openjdk[bot]

/label hotspot-compiler

merykitty avatar Aug 12 '22 13:08 merykitty

@merykitty The hotspot-compiler label was successfully added.

openjdk[bot] avatar Aug 12 '22 13:08 openjdk[bot]

Please rebase this PR and resolve the conflicts. Then I can have an internal tests for the aarch64 change parts. Thanks a lot!

XiaohongGong avatar Aug 18 '22 02:08 XiaohongGong

Thanks for your comments, I have redone the patch on top of the new merged aarch file.

merykitty avatar Aug 18 '22 02:08 merykitty

Tests pass on aarch64 NEON and SVE. So aarch64 part looks good to me. Thanks!

XiaohongGong avatar Aug 18 '22 06:08 XiaohongGong

I have resolved your comments, thanks.

merykitty avatar Aug 18 '22 07:08 merykitty

@vnkozlov I have merged the patch with latest JDK sources.

merykitty avatar Sep 21 '22 12:09 merykitty

@vnkozlov I have merged the patch with latest JDK sources.

@merykitty I submitted testing.

vnkozlov avatar Sep 21 '22 17:09 vnkozlov

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8292289: [vectorapi] Improve the implementation of VectorTestNode

Reviewed-by: xgong, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 220 new commits pushed to the master branch:

  • 1166c8e2c0047869cd50b7ddc5355290ac2a695a: 8296896: Change virtual Thread.yield to use external submit
  • 51759650e5593f48ce616a1a8abf51f5f8fd5302: 8298323: trivial typo in JOptionPane.OK_OPTION
  • d5cf18e7fb591185eecb042bfa015609ea7d15e0: 8296198: JFileChooser throws InternalError java.lang.InternalError with Windows shortcuts
  • 74f346b33f7fa053ad5c99ef85baa32b7fb12fa6: 8298075: RISC-V: Implement post-call NOPs
  • 3aa4070d4ca21b9e90388995efbcde318892e25f: 8294047: HttpResponseInputStream swallows interrupts
  • af8fb7eef7188ef762399cfb3faf5c8afd49efa7: 8282578: AIOOBE in javax.sound.sampled.Clip
  • 8b69a2e434ad2fa3369079622b57afb973d5bd9a: 8298099: [JVMCI] decouple libgraal from JVMCI module at runtime
  • 8a9911ef1762ae837e427ec9d91b1399ba33b6e4: 8295803: Console should be usable in jshell and other environments
  • 5d4c71c8bd361af78c90777f17b79e95d8eb5afe: 8281236: (D)TLS key exchange named groups
  • 10356e767a44632c5de142d4666bd85d4618bf71: 8298303: (fs) temporarily remove Path.getExtension
  • ... and 210 more: https://git.openjdk.org/jdk/compare/2f83b5c487f112c175d081ca5882f5032518937a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Sep 22 '22 15:09 openjdk[bot]

/reviewers 2

vnkozlov avatar Sep 23 '22 16:09 vnkozlov

@vnkozlov The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

openjdk[bot] avatar Sep 23 '22 16:09 openjdk[bot]

Similar to what has been discussed in #10192 , a mask of 512 bits is not matched into an XMMRegister, I have removed the untaken code paths.

merykitty avatar Sep 29 '22 20:09 merykitty

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Nov 10 '22 21:11 bridgekeeper[bot]

May I have another review for this PR, please? Thank you very much.

merykitty avatar Nov 29 '22 14:11 merykitty

@shqking Thanks a lot for your reviews, I have addressed those in the commits

merykitty avatar Dec 06 '22 14:12 merykitty

Rerunning DaCapo and Renaissance which shows some variations.

vnkozlov avatar Dec 07 '22 18:12 vnkozlov

@vnkozlov I believe VectorTestNode is only used in Vector API, which should not affect those benchmarks?

merykitty avatar Dec 08 '22 09:12 merykitty

@vnkozlov I believe VectorTestNode is only used in Vector API, which should not affect those benchmarks?

Yes, but you have changes in general code related to Cmp node. I see that DaCapo don't show regression after rerun. Renaissance is still running and numbers are "all over places" - it is not stable.

Anyway, I am approving this changes based on data I got.

vnkozlov avatar Dec 08 '22 09:12 vnkozlov

@vnkozlov Ah I get it, thanks a lot for your reviews, can I merge the patch now?

merykitty avatar Dec 08 '22 13:12 merykitty

Yes, you can integrate.

vnkozlov avatar Dec 08 '22 16:12 vnkozlov

Thank everyone for your kind reviews and suggestions. /integrate

merykitty avatar Dec 08 '22 20:12 merykitty

Going to push as commit 3dfadeebd023efb03a400f2b2656567a4154421a. Since your change was applied there have been 238 commits pushed to the master branch:

  • d35e840024b80f9f686fb5522dc03b2c9233a6d3: 8297295: Remove ThreadGroup.allowThreadSuspension
  • 175e3d3ff332be25cca9822c58c46f1e012953c2: 8296149: Start of release updates for JDK 21
  • d562d3fcbe22a0443037c5b447e1a41401275814: 8297642: PhaseIdealLoop::only_has_infinite_loops must detect all loops that never lead to termination
  • fc52f21f9a30c5c34caa06f8524c8d5bd74f16f7: 8298255: JFR provide information about dynamization of number of compiler threads
  • e555d5470536b8379179879ec7343e004be95e36: 8298383: JFR: GenerateJfrFiles.java lacks copyright header
  • c084431fae8c9f9b5a157cdaca484f63cbd6691a: 8298379: JFR: Some UNTIMED events only sets endTime
  • ea108f504ccb63fc9651e804e3bbba1c108dcead: 8298129: Let checkpoint event sizes grow beyond u4 limit
  • 165dcdd27de16824478ac9ebdfbd7b00fffe51e6: 8297718: Make NMT free:ing protocol more granular
  • fbe7b007383b034589e93d398706bebeb24461ee: 8298173: GarbageCollectionNotificationContentTest test failed: no decrease in Eden usage
  • d8ef60b406a9e8fe6cc6b7be0b74e45de38604c5: 8298272: Clean up ProblemList
  • ... and 228 more: https://git.openjdk.org/jdk/compare/2f83b5c487f112c175d081ca5882f5032518937a...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Dec 08 '22 20:12 openjdk[bot]

@merykitty Pushed as commit 3dfadeebd023efb03a400f2b2656567a4154421a.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Dec 08 '22 20:12 openjdk[bot]