jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait

Open eastig opened this issue 5 months ago • 18 comments
trafficstars

There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better:

  • https://github.com/mysql/mysql-server/pull/611
  • https://github.com/facebook/folly/pull/2390

There are discussions regarding using it for spin pauses:

  • https://github.com/gperftools/gperftools/pull/1594
  • https://github.com/haproxy/haproxy/pull/2974

Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension

CPUs supporting it:

  • Apple M2+
  • Neoverse-N2
  • Neoverse-V2

Tests:

  • Gtests passed.
  • test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java passed.
  • test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java passed.

Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2):

Benchmark                   Mode Cnt Score Error Units Diff
ThreadOnSpinWait.ISB     avgt     15    11.875 ± 0.129  ns/op
ThreadOnSpinWait.SB      avgt     15    6.930 ± 0.054  ns/op  -42%

Benchmark                            (maxNum)  (threadCount)  Mode  Cnt    Score    Error  Units  Diff
ThreadOnSpinWaitSharedCounter.ISB   1000000              4  avgt   15  49.874 ± 10.160  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000              4  avgt   15  26.948 ±  4.036  ms/op  -46%
ThreadOnSpinWaitSharedCounter.ISB   1000000              8  avgt   15  65.173 ±  7.228  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000              8  avgt   15  44.476 ±  1.292  ms/op  -31%
ThreadOnSpinWaitSharedCounter.ISB   1000000             16  avgt   15  177.805 ± 44.925  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000             16  avgt   15  67.267 ± 13.814  ms/op -62%
ThreadOnSpinWaitSharedCounter.ISB   1000000             32  avgt   15  265.149 ± 5.353  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000             32  avgt   15  42.297 ± 3.436  ms/op -84%
ThreadOnSpinWaitSharedCounter.ISB   1000000             48  avgt   15  125.231 ±  9.272  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000             48  avgt   15  83.504 ± 8.561  ms/op  -33%
ThreadOnSpinWaitSharedCounter.ISB   1000000             64  avgt   15  124.505 ±  7.543  ms/op
ThreadOnSpinWaitSharedCounter.SB    1000000             64  avgt   15  86.588 ± 9.519  ms/op -30%

Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25801/head:pull/25801
$ git checkout pull/25801

Update a local copy of the PR:
$ git checkout pull/25801
$ git pull https://git.openjdk.org/jdk.git pull/25801/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25801

View PR using the GUI difftool:
$ git pr show -t 25801

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25801.diff

Using Webrev

Link to Webrev Comment

eastig avatar Jun 13 '25 14:06 eastig

:wave: Welcome back eastigeevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Jun 13 '25 14:06 bridgekeeper[bot]

@eastig This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait

Reviewed-by: shade, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the master branch:

  • 5039b42de170769797312969185ee9d67f34cf24: 8359437: Make users and test suite not able to set LockingMode flag
  • 1ca008fd02496dc33e2707c102560cae1690fba5: 8360255: runtime/jni/checked/TestLargeUTF8Length.java fails with -XX:-CompactStrings
  • cf75f1f9c6d2bc70c7133cb81c73a0ce0946dff9: 8358821: patch_verified_entry causes problems, use nmethod entry barriers instead
  • ... and 8 more: https://git.openjdk.org/jdk/compare/f2ef809719cbb14f90a0a5f673e10e7c74fa0f45...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Jun 13 '25 14:06 openjdk[bot]

@eastig The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Jun 13 '25 14:06 openjdk[bot]

Hi @theRealAph , Could you please take a look?

eastig avatar Jun 13 '25 14:06 eastig

Webrevs

mlbridge[bot] avatar Jun 13 '25 14:06 mlbridge[bot]

I think this can wait until we have a use for SB.

theRealAph avatar Jun 13 '25 14:06 theRealAph

The case is to use it for spin pauses instead of ISB on Neoverse-N2/V2. There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better:

  • https://github.com/mysql/mysql-server/pull/611
  • https://github.com/facebook/folly/pull/2390

There are discussions regarding using it for spin pauses:

  • https://github.com/gperftools/gperftools/pull/1594
  • https://github.com/haproxy/haproxy/pull/2974

Do you think it is better to have a PR combining this PR and use of SB for spin pauses?

eastig avatar Jun 13 '25 16:06 eastig

BTW Arm published a post in their blog about different implementations of spin pauses: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm A high accuracy delay requires FEAT_SB (Armv8.5-A), FEAT_ECV (Armv8.6-A) and FEAT_WFxT (Armv8.7-A).

eastig avatar Jun 13 '25 16:06 eastig

FWIW, I don't mind the SB assembler support to go under this, separate PR. We sometimes do it to split the work in the series of atomic commits, where the commit like this should certainly be non-regressing. The actual use of SB (spin-pauses) can then come under separate RFE, and would require much more work (and have associated risk).

So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it.

shipilev avatar Jun 13 '25 17:06 shipilev

Do you think it is better to have a PR combining this PR and use of SB for spin pauses?

Yes, definitely, otherwise we're pushing dead code. Thanks.

theRealAph avatar Jun 14 '25 15:06 theRealAph

So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it.

Huh? The least confusing is when the SB support goes in the PR where it is used. That really is obvious, without any dependency chain.

theRealAph avatar Jun 14 '25 16:06 theRealAph

So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it.

Huh? The least confusing is when the SB support goes in the PR where it is used. That really is obvious, without any dependency chain.

I am flexible to have it either way.

One of the drawbacks of piling up the instruction support and the feature that uses these instructions: if there is ever a second feature that depends on the same instruction support, we would effectively bind two commits (commit A: instruction support + feature A; commit B: feature B) together with an accidental dependency. Which gets extra funky if you ever go with bisects, backouts, backports. Atomic commits rule, and I personally strive to do them, even if there is a window when some code appears dead momentarily.

But as I said, I would not quibble here. SB looks like something that we would solely use for spin-wait hints.

shipilev avatar Jun 16 '25 08:06 shipilev

Also, merge from mainline to get windows-aarch64 build fix, so that we test things there too.

shipilev avatar Jun 24 '25 16:06 shipilev

Looks okay to me. @theRealAph should also take a look.

I'm still waiting for a use for this thing. Then we'll be able to see it in action.

theRealAph avatar Jun 25 '25 15:06 theRealAph

Looks okay to me. @theRealAph should also take a look.

I'm still waiting for a use for this thing. Then we'll be able to see it in action.

Do you mean we need real-life workloads relying on j.l.Thread.onSpinWait to show improvements?

eastig avatar Jun 25 '25 16:06 eastig

I'm still waiting for a use for this thing.

The other project reports Evgeny linked in PR body look pretty convincing, as well as ThreadOnSpinWait microbenchmarks we have as well. This PR does not propose to switch to SB for spin-waits, AFAICS. Just having SB as the spin-wait option does look fine to me.

shipilev avatar Jun 25 '25 16:06 shipilev

The other project reports Evgeny linked in PR body look pretty convincing

Yes. Again, my apologies.

theRealAph avatar Jun 26 '25 08:06 theRealAph

Thank you, Andrew.

eastig avatar Jun 26 '25 10:06 eastig

/integrate

eastig avatar Jun 27 '25 12:06 eastig

Going to push as commit ecd2d83096a1fea7d5086736306770bcffa4fdb6. Since your change was applied there have been 29 commits pushed to the master branch:

  • d8f9b188fa488c9c6e343c62a148cfe9fc8a563b: 8268406: Deallocate jmethodID native memory
  • aa26cede635011f5cc075cd528934ce8d8e8eef9: 8360474: Add missing include guards for some HotSpot headers
  • 20e983a97c66902c61ee2fa1959a7e612266732b: 8360487: Remove unnecessary List.indexOf key from AbstractMidiDevice.TransmitterList.remove
  • ... and 26 more: https://git.openjdk.org/jdk/compare/f2ef809719cbb14f90a0a5f673e10e7c74fa0f45...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Jun 27 '25 12:06 openjdk[bot]

@eastig Pushed as commit ecd2d83096a1fea7d5086736306770bcffa4fdb6.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Jun 27 '25 12:06 openjdk[bot]

The modified test is now failing in our CI:

java.lang.RuntimeException: Missing compiler output for Thread.onSpinWait intrinsic

Filed: JDK-8360936

dholmes-ora avatar Jun 28 '25 04:06 dholmes-ora