openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

WIP: Accelerate Unsafe CAS Intrinsics on Z

Open IBMJimmyk opened this issue 1 year ago • 5 comments

Adds support for the following recognized methods:

CompareAndExchangeObject     //JDK11
CompareAndExchangeReference  //JDK17,21
CompareAndExchangeInt        //JDK11,17,21
CompareAndExchangeLong       //JDK11,17,21

The accelerated CAE code was built on top of the existing accelerated CAS support on Z. Changes are similar to previously added support on Power and X.

Edit: Depends on: https://github.com/eclipse/omr/pull/7482 Ideally merged as a pair.

IBMJimmyk avatar Oct 06 '24 21:10 IBMJimmyk

This is marked WIP while I get finish running through all the unit tests I wrote. But all the tests for major functionality are passing so far so I don't expect any major changes.

@hzongaro Could you take a look at this. The common changes are small in an area you've looked at recently so it should be fairly quick.

@r30shah Could you look over the Z codegen changes? Everything seems to work but I don't work when Z codegen often so I might have missed smaller details.

IBMJimmyk avatar Oct 06 '24 21:10 IBMJimmyk

I fixed a problem with my previous implementation.

Under compressedrefs compareAndExchangeReference uses CS to perform the CAS operation. Afterwards the upper 32 bits need to be cleared since they may contain garbage and this was not being done. Those upper bits could be read by later operations such as an decompression sequence.

The new change adds a LLGFR instruction to clear the upper bits.

So for the compressedrefs compareAndExchangeReference case things now look like this:

Example trees:

n58n      BBStart <block_4> (freq 6)                                                          [     0x3ff8cba01e0] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n98n        GlRegDeps ()                                                                      [     0x3ff8cba0e60] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=5 flg=0x20
n99n          aRegLoad GPR7   <temp slot 11>[#433  Auto] [flags 0x4007 0x0 ] (SeenRealReference )  [     0x3ff8cba0eb0] bci=[-1,10,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n100n         aRegLoad GPR8   <temp slot 10>[#432  Auto] [flags 0x4007 0x0 ] (SeenRealReference )  [     0x3ff8cba0f00] bci=[-1,9,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n101n         lRegLoad GPR9   <temp slot 9>[#431  Auto] [flags 0x4 0x0 ] (cannotOverflow SeenRealReference )  [     0x3ff8cba0f50] bci=[-1,6,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x9000
n102n         aRegLoad GPR10   <temp slot 8>[#430  Auto] [flags 0x4007 0x0 ] (SeenRealReference )  [     0x3ff8cba0fa0] bci=[-1,4,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n103n         aRegLoad GPR11   <temp slot 7>[#429  Auto] [flags 0x4007 0x0 ] (SeenRealReference )  [     0x3ff8cba0ff0] bci=[-1,1,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n48n      NULLCHK on n103n [#32]                                                              [     0x3ff8cb05630] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n49n        acall  jdk/internal/misc/Unsafe.compareAndExchangeReference(Ljava/lang/Object;JLjava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;[#411  final native virtual Method -344] [f
lags 0x20500 0x0 ] ()  [     0x3ff8cb05680] bci=[-1,11,220] rc=3 vc=461 vn=- li=4 udi=- nc=5 flg=0x80020
n103n         ==>aRegLoad
n102n         ==>aRegLoad
n101n         ==>lRegLoad
n128n         l2i                                                                             [     0x3ff8cba17c0] bci=[-1,9,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n127n           a2l (unneededConv )                                                           [     0x3ff8cba1770] bci=[-1,9,220] rc=1 vc=461 vn=- li=4 udi=- nc=1 flg=0x400
n100n             ==>aRegLoad
n131n         l2i                                                                             [     0x3ff8cba18b0] bci=[-1,10,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n130n           a2l (unneededConv )                                                           [     0x3ff8cba1860] bci=[-1,10,220] rc=1 vc=461 vn=- li=4 udi=- nc=1 flg=0x400
n99n              ==>aRegLoad
n63n      aRegStore GPR6                                                                      [     0x3ff8cba0370] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n49n        ==>acall
n60n      goto --> block_3 BBStart at n56n                                                    [     0x3ff8cba0280] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n111n       GlRegDeps                                                                         [     0x3ff8cba1270] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n105n         PassThrough GPR6                                                                [     0x3ff8cba1090] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n49n            ==>acall
n59n      BBEnd </block_4>                                                                    [     0x3ff8cba0230] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=0

Example instructions:

 [     0x3ff8cc59040]                          assocreg
 [     0x3ff8cc595a0]                          Label L0016:
 [     0x3ff8cc5a000]                          assocreg
  PRE:
 {GPR7:&GPR_0128:R} {GPR8:&GPR_0129:R} {GPR9:GPR_0130:R} {GPR10:&GPR_0131:R} {GPR11:&GPR_0132:R}
 [     0x3ff8cc59a30]                          DEPEND
 POST:
 {GPR7:&GPR_0128:R} {GPR8:&GPR_0129:R} {GPR9:GPR_0130:R} {GPR10:&GPR_0131:R} {GPR11:&GPR_0132:R}
 [     0x3ff8cc5a0d0]                          fence   Relative [ 000003FF8CB946D0 ] BBStart <block_4> (frequency 6)
 [     0x3ff8cc5a870]                          CGIT    &GPR_0132,0,BH(mask=0x8),
 [     0x3ff8cc5a960]                          AGR     GPR_0130,&GPR_0131
 [     0x3ff8cc5ab00]                          CS      &GPR_0129,&GPR_0128,#438 0(GPR_0130)
 [     0x3ff8cc5abe0]                          LLGFR   &GPR_0129,&GPR_0129
 [     0x3ff8cc5b150]                          LGR     GPR_0133,&GPR_0131
 [     0x3ff8cc5b230]                          SG      GPR_0133,#439 216(GPR13)
 [     0x3ff8cc5b300]                          CLG     GPR_0133,#440 224(GPR13)
 [     0x3ff8cc5b3d0]                          BRC     BL(0x2), Label L0064
 [     0x3ff8cc5b630]                          TM      #441 421(GPR13), 0x10
 [     0x3ff8cc5b710]                          BRC     BH(0x8), Label L0067
 [     0x3ff8cc5b7f0]                          SRLG    GPR_0133,GPR_0133,9
 [     0x3ff8cc5b990]                          AG      GPR_0133,#442 200(GPR13)
 [     0x3ff8cc5bb20]                          MVI     #443 0(GPR_0133), 0x01
 [     0x3ff8cc5c1c0]                          assocreg
 [     0x3ff8cc5bc00]                          Label L0067:
 POST:
 {GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
 [     0x3ff8cc5c850]                          assocreg
 [     0x3ff8cc5c290]                          Label L0066:
 POST:
 {GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
 [     0x3ff8cc5c9e0]                          TM      #444 3(&GPR_0131), 0xf0
 [     0x3ff8cc5cac0]                          BRC     MASK9(0x8), Snippet Label L0065
 [     0x3ff8cc5d1d0]                          assocreg
 [     0x3ff8cc5cc10]                          Label L0064:
 POST:
 {GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
 [     0x3ff8cc5e000]                          assocreg
  PRE:
 {GPR6:&GPR_0129:R}
 [     0x3ff8cc5da40]                          BRC     NOP(0xf), Label L0017
 POST:
 {GPR6:&GPR_0129:R}
 [     0x3ff8cc5e1f0]                          fence   Relative [ 000003FF8CB946D4 ] BBEnd </block_4>

IBMJimmyk avatar Oct 17 '24 21:10 IBMJimmyk

I didn't put too much thought into the design to use TR_DisableCAEIntrinsic to disable the changes. I put it in initially for testing purposes but left in since it seemed like it might be useful to be able to turn off the feature if unexpected problems occur. Did you want this changed to set a flag instead? While it might help with the platform check logic, I think those checks will all go away anyways after I add Arm support for these intrinsics. Also, from looking at a few of those flags, a lot of them seemed to be controlled by an envvar anyways.

In regards to that comment. I can move it (in both places it occurs).

IBMJimmyk avatar Oct 18 '24 20:10 IBMJimmyk

Did you want this changed to set a flag instead? While it might help with the platform check logic, I think those checks will all go away anyways after I add Arm support for these intrinsics. Also, from looking at a few of those flags, a lot of them seemed to be controlled by an envvar anyways.

I think a way control inlining of such intrinsic will be good for debugging purposes. IMO using flag is much more cleaner way to achieve that and make code more readable, and instead of checking for environment variable at n places, it can simply do it in the code-gen.

r30shah avatar Oct 18 '24 21:10 r30shah

I think a way control inlining of such intrinsic will be good for debugging purposes. IMO using flag is much more cleaner way to achieve that and make code more readable, and instead of checking for environment variable at n places, it can simply do it in the code-gen.

This being said, I think it can be done with the other PR (Whenever you do work on ARM implementation), so it is OK if we want to make progress with this PR and have changes for Z merged.

r30shah avatar Oct 18 '24 21:10 r30shah

I moved the indicated comments. I will also create a new PR later to handle the chanage to use the flag instead.

I rebased my change on the latest master branch. Everything should be good for merging now.

IBMJimmyk avatar Oct 21 '24 15:10 IBMJimmyk

Jenkins test sanity zlinux jdk11,jdk21 depends https://github.com/eclipse/omr/pull/7482

r30shah avatar Oct 21 '24 15:10 r30shah

Given that https://github.com/eclipse/omr/pull/7482 is merged, and all the test on this PR passed. Merging this one.

r30shah avatar Oct 21 '24 20:10 r30shah

Given that eclipse/omr#7482 is merged, and all the test on this PR passed. Merging this one.

I think this pull request was supposed to wait until eclipse/omr#7482 is promoted to the https://github.com/eclipse-openj9/openj9-omr/tree/openj9. It's possible that asynccheck operations will be removed when they shouldn't be, so we should to keep that in mind if we see any failures in testing on z before the OMR pull request is promoted.

hzongaro avatar Oct 21 '24 20:10 hzongaro