openj9
openj9 copied to clipboard
WIP: Accelerate Unsafe CAS Intrinsics on Z
Adds support for the following recognized methods:
CompareAndExchangeObject //JDK11
CompareAndExchangeReference //JDK17,21
CompareAndExchangeInt //JDK11,17,21
CompareAndExchangeLong //JDK11,17,21
The accelerated CAE code was built on top of the existing accelerated CAS support on Z. Changes are similar to previously added support on Power and X.
Edit: Depends on: https://github.com/eclipse/omr/pull/7482 Ideally merged as a pair.
This is marked WIP while I get finish running through all the unit tests I wrote. But all the tests for major functionality are passing so far so I don't expect any major changes.
@hzongaro Could you take a look at this. The common changes are small in an area you've looked at recently so it should be fairly quick.
@r30shah Could you look over the Z codegen changes? Everything seems to work but I don't work when Z codegen often so I might have missed smaller details.
I fixed a problem with my previous implementation.
Under compressedrefs compareAndExchangeReference uses CS to perform the CAS operation. Afterwards the upper 32 bits need to be cleared since they may contain garbage and this was not being done. Those upper bits could be read by later operations such as an decompression sequence.
The new change adds a LLGFR instruction to clear the upper bits.
So for the compressedrefs compareAndExchangeReference case things now look like this:
Example trees:
n58n BBStart <block_4> (freq 6) [ 0x3ff8cba01e0] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n98n GlRegDeps () [ 0x3ff8cba0e60] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=5 flg=0x20
n99n aRegLoad GPR7 <temp slot 11>[#433 Auto] [flags 0x4007 0x0 ] (SeenRealReference ) [ 0x3ff8cba0eb0] bci=[-1,10,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n100n aRegLoad GPR8 <temp slot 10>[#432 Auto] [flags 0x4007 0x0 ] (SeenRealReference ) [ 0x3ff8cba0f00] bci=[-1,9,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n101n lRegLoad GPR9 <temp slot 9>[#431 Auto] [flags 0x4 0x0 ] (cannotOverflow SeenRealReference ) [ 0x3ff8cba0f50] bci=[-1,6,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x9000
n102n aRegLoad GPR10 <temp slot 8>[#430 Auto] [flags 0x4007 0x0 ] (SeenRealReference ) [ 0x3ff8cba0fa0] bci=[-1,4,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n103n aRegLoad GPR11 <temp slot 7>[#429 Auto] [flags 0x4007 0x0 ] (SeenRealReference ) [ 0x3ff8cba0ff0] bci=[-1,1,220] rc=2 vc=461 vn=- li=4 udi=- nc=0 flg=0x8000
n48n NULLCHK on n103n [#32] [ 0x3ff8cb05630] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n49n acall jdk/internal/misc/Unsafe.compareAndExchangeReference(Ljava/lang/Object;JLjava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;[#411 final native virtual Method -344] [f
lags 0x20500 0x0 ] () [ 0x3ff8cb05680] bci=[-1,11,220] rc=3 vc=461 vn=- li=4 udi=- nc=5 flg=0x80020
n103n ==>aRegLoad
n102n ==>aRegLoad
n101n ==>lRegLoad
n128n l2i [ 0x3ff8cba17c0] bci=[-1,9,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n127n a2l (unneededConv ) [ 0x3ff8cba1770] bci=[-1,9,220] rc=1 vc=461 vn=- li=4 udi=- nc=1 flg=0x400
n100n ==>aRegLoad
n131n l2i [ 0x3ff8cba18b0] bci=[-1,10,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n130n a2l (unneededConv ) [ 0x3ff8cba1860] bci=[-1,10,220] rc=1 vc=461 vn=- li=4 udi=- nc=1 flg=0x400
n99n ==>aRegLoad
n63n aRegStore GPR6 [ 0x3ff8cba0370] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n49n ==>acall
n60n goto --> block_3 BBStart at n56n [ 0x3ff8cba0280] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=1
n111n GlRegDeps [ 0x3ff8cba1270] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n105n PassThrough GPR6 [ 0x3ff8cba1090] bci=[-1,11,220] rc=1 vc=461 vn=- li=4 udi=- nc=1
n49n ==>acall
n59n BBEnd </block_4> [ 0x3ff8cba0230] bci=[-1,11,220] rc=0 vc=461 vn=- li=4 udi=- nc=0
Example instructions:
[ 0x3ff8cc59040] assocreg
[ 0x3ff8cc595a0] Label L0016:
[ 0x3ff8cc5a000] assocreg
PRE:
{GPR7:&GPR_0128:R} {GPR8:&GPR_0129:R} {GPR9:GPR_0130:R} {GPR10:&GPR_0131:R} {GPR11:&GPR_0132:R}
[ 0x3ff8cc59a30] DEPEND
POST:
{GPR7:&GPR_0128:R} {GPR8:&GPR_0129:R} {GPR9:GPR_0130:R} {GPR10:&GPR_0131:R} {GPR11:&GPR_0132:R}
[ 0x3ff8cc5a0d0] fence Relative [ 000003FF8CB946D0 ] BBStart <block_4> (frequency 6)
[ 0x3ff8cc5a870] CGIT &GPR_0132,0,BH(mask=0x8),
[ 0x3ff8cc5a960] AGR GPR_0130,&GPR_0131
[ 0x3ff8cc5ab00] CS &GPR_0129,&GPR_0128,#438 0(GPR_0130)
[ 0x3ff8cc5abe0] LLGFR &GPR_0129,&GPR_0129
[ 0x3ff8cc5b150] LGR GPR_0133,&GPR_0131
[ 0x3ff8cc5b230] SG GPR_0133,#439 216(GPR13)
[ 0x3ff8cc5b300] CLG GPR_0133,#440 224(GPR13)
[ 0x3ff8cc5b3d0] BRC BL(0x2), Label L0064
[ 0x3ff8cc5b630] TM #441 421(GPR13), 0x10
[ 0x3ff8cc5b710] BRC BH(0x8), Label L0067
[ 0x3ff8cc5b7f0] SRLG GPR_0133,GPR_0133,9
[ 0x3ff8cc5b990] AG GPR_0133,#442 200(GPR13)
[ 0x3ff8cc5bb20] MVI #443 0(GPR_0133), 0x01
[ 0x3ff8cc5c1c0] assocreg
[ 0x3ff8cc5bc00] Label L0067:
POST:
{GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
[ 0x3ff8cc5c850] assocreg
[ 0x3ff8cc5c290] Label L0066:
POST:
{GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
[ 0x3ff8cc5c9e0] TM #444 3(&GPR_0131), 0xf0
[ 0x3ff8cc5cac0] BRC MASK9(0x8), Snippet Label L0065
[ 0x3ff8cc5d1d0] assocreg
[ 0x3ff8cc5cc10] Label L0064:
POST:
{GPR1:&GPR_0131:R} {GPR2:&GPR_0128:R} {GPR4:GPR_0133:R} {GPR14:GPR_0134:R}
[ 0x3ff8cc5e000] assocreg
PRE:
{GPR6:&GPR_0129:R}
[ 0x3ff8cc5da40] BRC NOP(0xf), Label L0017
POST:
{GPR6:&GPR_0129:R}
[ 0x3ff8cc5e1f0] fence Relative [ 000003FF8CB946D4 ] BBEnd </block_4>
I didn't put too much thought into the design to use TR_DisableCAEIntrinsic to disable the changes. I put it in initially for testing purposes but left in since it seemed like it might be useful to be able to turn off the feature if unexpected problems occur. Did you want this changed to set a flag instead? While it might help with the platform check logic, I think those checks will all go away anyways after I add Arm support for these intrinsics. Also, from looking at a few of those flags, a lot of them seemed to be controlled by an envvar anyways.
In regards to that comment. I can move it (in both places it occurs).
Did you want this changed to set a flag instead? While it might help with the platform check logic, I think those checks will all go away anyways after I add Arm support for these intrinsics. Also, from looking at a few of those flags, a lot of them seemed to be controlled by an envvar anyways.
I think a way control inlining of such intrinsic will be good for debugging purposes. IMO using flag is much more cleaner way to achieve that and make code more readable, and instead of checking for environment variable at n places, it can simply do it in the code-gen.
I think a way control inlining of such intrinsic will be good for debugging purposes. IMO using flag is much more cleaner way to achieve that and make code more readable, and instead of checking for environment variable at n places, it can simply do it in the code-gen.
This being said, I think it can be done with the other PR (Whenever you do work on ARM implementation), so it is OK if we want to make progress with this PR and have changes for Z merged.
I moved the indicated comments. I will also create a new PR later to handle the chanage to use the flag instead.
I rebased my change on the latest master branch. Everything should be good for merging now.
Jenkins test sanity zlinux jdk11,jdk21 depends https://github.com/eclipse/omr/pull/7482
Given that https://github.com/eclipse/omr/pull/7482 is merged, and all the test on this PR passed. Merging this one.
Given that eclipse/omr#7482 is merged, and all the test on this PR passed. Merging this one.
I think this pull request was supposed to wait until eclipse/omr#7482 is promoted to the https://github.com/eclipse-openj9/openj9-omr/tree/openj9. It's possible that asynccheck operations will be removed when they shouldn't be, so we should to keep that in mind if we see any failures in testing on z before the OMR pull request is promoted.