jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8360654: AArch64: Remove redundant dmb from C1 compareAndSet

Open spchee opened this issue 5 months ago • 10 comments

AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg:

;; cmpxchg {
  0x0000e708d144cf60:   mov	x8, x2
  0x0000e708d144cf64:   casal	x8, x3, [x0]
  0x0000e708d144cf68:   cmp	x8, x2
 ;; 0x1F1F1F1F1F1F1F1F
  0x0000e708d144cf6c:   mov	x8, #0x1f1f1f1f1f1f1f1f
 ;; } cmpxchg
  0x0000e708d144cf70:   cset	x8, ne  // ne = any
  0x0000e708d144cf74:   dmb	ish

According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2]

Atomically sets the value of a variable to the newValue with the memory semantics of setVolatile if the variable's current value, referred to as the witness value, == the expectedValue, as accessed with the memory semantics of getVolatile.

Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen.

Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant)

This is also reflected by C2 not having a dmb for the same respective method.

[1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...)


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8360654: AArch64: Remove redundant dmb from C1 compareAndSet (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26000/head:pull/26000
$ git checkout pull/26000

Update a local copy of the PR:
$ git checkout pull/26000
$ git pull https://git.openjdk.org/jdk.git pull/26000/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26000

View PR using the GUI difftool:
$ git pr show -t 26000

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26000.diff

Using Webrev

Link to Webrev Comment

spchee avatar Jun 26 '25 12:06 spchee

Hi @spchee, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user spchee" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

bridgekeeper[bot] avatar Jun 26 '25 12:06 bridgekeeper[bot]

❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.

openjdk[bot] avatar Jun 26 '25 12:06 openjdk[bot]

@spchee The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Jun 26 '25 12:06 openjdk[bot]

/covered

spchee avatar Jun 26 '25 12:06 spchee

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

bridgekeeper[bot] avatar Jun 26 '25 12:06 bridgekeeper[bot]

I'm not sure this is safe.

In C1, if we had a stlxr that succeeded, followed by a volatile load, then we'd generate:

    stlxr status, data, [addr]
    cbnz status, retry
    ldr r1, [something]
    dmb ish

I think there's nothing to prevent the ldr from being reordered with the stlxr, violating sequential consistency. You could argue that subsequent memory operations are control dependent on the cbnz so can't be reordered, but if a microarchitecture predicts that the stlxr can never fail, the control dependency can be folded away.

theRealAph avatar Jun 27 '25 09:06 theRealAph

Thanks for the feedback, could it be sufficient then to emit the dmb when not using LSE? And when using LSE, to not do so?

I have tested it with jcstress, although I have not done so using the -XX:-UseLSE flag.

spchee avatar Jun 27 '25 09:06 spchee

Thanks for the feedback, could it be sufficient then to emit the dmb when not using LSE? And when using LSE, to not do so?

I don't think that LSE helps here. There's nothing to stop an unordered load moving before a casal. See atomic-ordered-before in The AArch64 Application Level Memory Model.

theRealAph avatar Jun 27 '25 10:06 theRealAph

The best way to check this stuff is with the herd7 memory model simulator, which can find reorderings you wouldn't believe.

theRealAph avatar Jun 27 '25 10:06 theRealAph

I can double check with the herd7 simulator, but since the casal will always produce an acquire, to me it seems impossible that a load can be moved before the casal due the acquire within the casal.

Clause 9 of before-barrier-ordering in the Arm Architecture reference manual also supports this.

spchee avatar Jun 27 '25 12:06 spchee

Clause 9 of before-barrier-ordering in the Arm Architecture reference manual also supports this.

Which clause is that?

theRealAph avatar Jun 27 '25 12:06 theRealAph

Hi @theRealAph,

The clause can be find here, the last bullet point on this page - https://mozilla.github.io/pdf.js/web/viewer.html?file=https://documentation-service.arm.com/static/6839d7585475b403d943b4dc#page=255&pagemode=none

Also, we have come up with two herd7 tests which should hopefully prove it to be alright.

{
x=0;
y=0;
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
 P0                   | P1             ;
 MOV W0,#1            | MOV W0, #1     ;
 MOV W2,#2            | MOV W2, #2     ;
 CASAL W0, W2, [X1]   | ;
 LDR W4,[X3]          | STR W0, [X3]   ;
                      | DMB ISH        ;
                      | STR W2, [X1]   ;
exists
(0:X0=2 /\ 0:X4=0)

Here, the stores by P1 are happening in order: y = 1; x = 2; and the reads in P0 are happening by CASAL first - from x and then by LDR - from y. The condition constraint checks is that CASAL can't read 2 from x if LDR read 0 from y - the constraint should be fulfilled unless the reads are reordered.

And

{
x = 1; y =1;
0: X1=x; 0:X3=y;
1: X3=x; 1:X1=y;
}

P0                  | P1  ;
MOV W0, #1          | MOV W0, #1;
MOV W2, #2          | MOV W2, #2;
CASAL W0, W2, [X1]  | CASAL W0, W2, [X1];
LDR W4, [X3]        | LDR W4, [X3];

exists
 (0:X4=1 /\ 1: X4=1)

Here, both X4's being equal to 1 is disallowed, as that would indicate that one of the ldrs was reordered before the CASAL. As the CASAL's will always succeed by default meaning atleast one of the LDR's will load a non-1 value into W4. Hence (0:X4=1 /\ 1: X4=1) can only ever occur if an ldr gets ordered before the CASAL.

Hope this helps :)

spchee avatar Jul 04 '25 16:07 spchee

On 04/07/2025 17:28, Samuel Chee wrote:

Hope this helps :)

Thanks, this looks convincing.

Please allow some time for me to do some more checking. This is a tricky area, and the the cost if we get it wrong is high.

theRealAph avatar Jul 07 '25 08:07 theRealAph

FYI, I'm still looking at this.

It seems that the definition of barrier-ordered-before has been strengthened since this code was written. A test that I wrote a few years ago now passes on the online Herd7 simulator, where it used to fail. Back then I commented

  // At the time of writing we don't know of any AArch64 hardware that
  // reorders stores in this way, but the Reference Manual permits it.

... and confirmed my interpretation with the author of the Reference Manual.

I'm guessing that older AArch64 implementations still in use never did such reorderings.

theRealAph avatar Jul 11 '25 13:07 theRealAph

With the help of Will Deacon, one of the authors of the memory model. I've now got to the bottom of this.

It is indeed a change to the MM, dating from 2002.

I agree that the DMB isn't needed here because the CASAL has both acquire and release semantics. However, I don't think that's related to the snippet of the architecture you have above but rather comes from:

// DDI0487L_b // Barrier-ordered-before (B2-255) ...

  • All of the following apply:
    • E1 is an Explicit Memory Write Effect and is generated by an atomic instruction with both Acquire and Release semantics.
    • E1 appears in program order before E2.
    • One of the following applies:
      • E2 is an Explicit Memory Effect.
      • E2 is an Implicit Tag Memory Read Effect.
      • E2 is an MMU Fault Effect.

Which says that the release store of the CASAL is ordered before the the subsequent store to y. Note that this wouldn't work if you used CASL instead.

The full details of the MM change are here:

http://github.com/herd/herdtools7/commit/636b7163c0679c691b8cf9a04623cd3aa1cc0ec3

theRealAph avatar Jul 15 '25 10:07 theRealAph

So, this change looks good, and we can remove trailing DMBs from most CASALs.

theRealAph avatar Jul 15 '25 11:07 theRealAph

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 15/07/2025 12:02, Andrew Haley wrote:

It is indeed a change to the MM, dating from 2002.

2022, obvs.

mlbridge[bot] avatar Jul 15 '25 11:07 mlbridge[bot]

Great! Glad you're convinced of its correctness.

we can remove trailing DMBs from most CASALs.

Just do bear in mind that CASAL doesn't emit release semantics if the compare fails so I imagine there might be cases where a trailing dmb might still necessary.

spchee avatar Jul 15 '25 12:07 spchee

Great! Glad you're convinced of its correctness.

we can remove trailing DMBs from most CASALs.

Just do bear in mind that CASAL doesn't emit release semantics if the compare fails so I imagine there might be cases where a trailing dmb might still necessary.

Yes, that's what the change log in the link I posted says too. We don't care about such assumptions in Java code, but in some C++ code which assumes that CAS implies a full barrier. I don't think anyone really knows every bit of C++ code in HotSpot which does assume this, so we tend to assume the worst.

theRealAph avatar Jul 15 '25 12:07 theRealAph

I think we still need a DMB after non-LSE CMPXCHG, which gets failures without this DMB:

AArch64 MP

{
0:X0=x; 0:X2=y;
1:X0=y; 1:X4=x;
}
 P0           | P1               ;
 LDAR W1,[X0] | MOV W2,#1        ;
              | L0: ;
 LDR W3,[X2]  | LDAXR W1,[X0] ;
              | STLXR W8,W2,[X0] ;
              | CBNZ W8,L0;
              | DMB ISH;
              | MOV W3,#1        ;
              | STR W3,[X4]      ;
exists (0:X1=1 /\ 0:X3=0 /\ 1:X1=0)

theRealAph avatar Jul 16 '25 14:07 theRealAph

Have just updated with change to have it still emit a dmb when LSE is not enabled. Should be good to go now hopefully :)

spchee avatar Jul 17 '25 12:07 spchee

@spchee This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Sep 16 '25 22:09 bridgekeeper[bot]

/keepalive

theRealAph avatar Oct 07 '25 10:10 theRealAph

@theRealAph The pull request is being re-evaluated and the inactivity timeout has been reset.

openjdk[bot] avatar Oct 07 '25 10:10 openjdk[bot]

/keepalive

ruben-arm avatar Nov 03 '25 09:11 ruben-arm

@ruben-arm To use the /touch command, you need to be in the OpenJDK census and your GitHub account needs to be linked with your OpenJDK username (how to associate your GitHub account with your OpenJDK username).

openjdk[bot] avatar Nov 03 '25 09:11 openjdk[bot]

/keepalive

eastig avatar Nov 03 '25 11:11 eastig

@eastig The pull request is being re-evaluated and the inactivity timeout has been reset.

openjdk[bot] avatar Nov 03 '25 11:11 openjdk[bot]

Hi @theRealAph, I've pushed changes for this PR to a new branch https://github.com/openjdk/jdk/compare/master...ruben-arm:jdk:pr-8360654 as Samuel is currently not available. Once he is back, he can update this PR's branch. In the meanwhile, I'm planning to run more of the jcstress testing. I'd appreciate your feedback on the version in the new branch.

ruben-arm avatar Nov 03 '25 16:11 ruben-arm