WIP: Accelerate Unsafe CAS Intrinsics on Power and X
Adds support for the following recognized methods:
CompareAndExchangeObject //JDK11
CompareAndExchangeReference //JDK17,21
CompareAndExchangeInt //JDK11,17,21
CompareAndExchangeLong //JDK11,17,21
Similar to their CompareAndSet counterparts, the JIT acceleration does not support CAE on static fields so the createUnsafeCASCallDiamond function was updated to also work on the CAE methods.
The accelerated CAE code was built on top of the existing accelerated CAS support on both Power and X.
Edit:
X changes are dependent on this OMR PR:
https://github.com/eclipse/omr/pull/7438
Behavior of inlineCompareAndSwapNative and inlineCompareAndSwapObjectNative in J9TreeEvaluator.cpp must match behavior in willBeEvaluatedAsCallByCodeGen in OMRCodeGenerator.cpp
@hzongaro Is there someone who can help take a look at the X and Common changes? In particular that I am updating alwaysWorthInlining and createUnsafeCASCallDiamond correctly for common changes. For X, everything could use a close look since I don't work on X codegen often. in particular here, a double check that I have the logic right for reference counts on nodes since that caused a few problems.
@zl-wang Could you take a look at the Power codegen changes to make sure there are no problems?
are we supposed to go around the "public" (not sure about this) Unsafe interface/class and support the internal Unsafe APIs? i.e., do typical developers import jdk.internal.misc.Unsafe? In addition to recognize the former class' APIs, we need to recognize the internal newer ones as well? @tajila @vijaysun-omr what is your take?
Are you suggesting that there are "public" equivalents of the apis being discussed in this PR (that we already accelerate or plan to accelerate) ? If not, then internal (non-public) or not, it could have a bearing on performance, right ?
there are two Unsafe classes in JDK ... for example jdk21:
src/jdk.unsupported/share/classes/sun/misc/Unsafe.java
src/java.base/share/classes/jdk/internal/misc/Unsafe.java
APIs in the former class basically use the latter's for implementation. Something like below:
public final boolean compareAndSwapObject(Object o, long offset,
Object expected,
Object x) {
return theInternalUnsafe.compareAndSetReference(o, offset, expected, x);
}
if programmers are expected (if they ever do) to use the former only, it might not be necessary to go behind it and deal with the new names.
Only these ones exist in sun/misc/Unsafe:
compareAndSwapObject
compareAndSwapInt
compareAndSwapLong
These ones require going through jdk/internal/misc/Unsafe.java:
CompareAndExchangeObject
CompareAndExchangeReference
CompareAndExchangeInt
CompareAndExchangeLong
we need to recognize the internal newer ones as well? @tajila @vijaysun-omr what is your take?
Most new applications and the JDK itself have switched to using jdk.internal.misc.Unsafe. sun.misc.Unsafe is still available but will likely go at some point.
One thing to remember is that in JDK8 there is only sun.misc.Unsafe.
Most new applications and the JDK itself have switched to using jdk.internal.misc.Unsafe.
Thanks for the clarification.
I added a few more fixes mostly related to problems on x that showed up after more rigorous testing.
I missed adding jdk_internal_misc_Unsafe_compareAndExchangeObject to one location on x which stopped the method from being converted into inlined assembly so I added that in.
Setting TR_UseOldCompareAndSwapObject would cause compareAndExchange[Object|Reference] to break on x since the older path was not updated to work with these methods. I updated them now.
Setting TR_UseOldCompareAndSwapObject and TR_DisableCASInlining would cause problems, also on x, even without my changes. This is related to hiding object compression from the codegen. I added a fix here so the compression is not hidden if TR_DisableCASInlining is set.
I found another location I missed adding disableCAEIntrinsic to so I added that in.
I got one of the native checks backwards inside alwaysWorthInlining. This check was for natives that already failed the inlinable JNI check so they aren't supposed to return true from alwaysWorthInlining.
I discovered a problem on x where the accelerated version of CompareAndExchangeObject would making an unexpectedly high number of call outs to jitWriteBarrierStoreGenerationalAndConcurrentMark which hurt performance by around 60% compared to CompareAndSetObject. The two cases were expected to have similar performance.
I eventually tracked down the problem to this line: https://github.com/eclipse-openj9/openj9/blob/91b5703ea59e75ae6c72e4c65123ba13bc06488a/runtime/compiler/x/codegen/J9TreeEvaluator.cpp#L10529
compareAndSetObject is an icall while compareAndExchangeObject is an acall. This means unsafeCallBarrier is true for compareAndSetObject and false for compareAndExchangeObject. This lead to different code being generated and different performance.
VMwrtbarWithoutStoreEvaluator is used for several different opcodes which include arraycopy, ArrayStoreCHK, write barriers and Unsafe CAS calls (that CAS an object).
To differentiate Unsafe CAS calls from the other cases, there is a check for the node being an icall since the other cases are not icalls. This check is no longer valid with the introduction of support for compareAndExchangeReference. The node in the CAE case is an acall.
The commit I just added changes the check to a generic call check. There is also a check for the call being to a recognized method. This check now checks for specific recognized methods which are:
sun_misc_Unsafe_compareAndSwapObject_jlObjectJjlObjectjlObject_Z //covers compareAndSwapObject, compareAndSetObject, compareAndSetReference
jdk_internal_misc_Unsafe_compareAndExchangeObject
jdk_internal_misc_Unsafe_compareAndExchangeReference
With this new change, the performance has returned to the expected area.
With the latest change, all the tests that I wrote to try out different edges cases are working as expected. @hzongaro, I know it has only been a few days since my last x related change but have you had a chance to continue your code review?
have you had a chance to continue your code review?
I hadn't had a chance to come back to it. I'll take another look at it now.
I have some more tests that look like they are failing now. So I'll need to figure out the problem before this can be merged.
I found the problem.
https://github.com/eclipse/omr/blob/8c233cdf9151d5b097733a426537d4fd6c37051f/compiler/optimizer/VPHandlers.cpp#L5410
isUnsafeCAS() does not pass in a parameter.
https://github.com/eclipse/omr/blob/8c233cdf9151d5b097733a426537d4fd6c37051f/compiler/compile/OMRMethod.hpp#L153
So NULL is passed in as the TR::Compilation *.
https://github.com/IBMJimmyk/openj9/blob/8f7d8c2ce7914c5ae4a80413002fa8221077f7cb/runtime/compiler/env/j9method.cpp#L5545-L5548
This is the temporary platform check that is here until support is added for every platform. It assumed TR::Compilation * c is not NULL which is incorrect.
I will double check these platform checks do not cause any other problems.
I retested after I added the fix to isUnsafeCAS so it doesn't use a NULL comp object and the tests pass now.
@hzongaro Can you look over the latest changes? If they are good I think this is finally ready to be merged in.
May I ask you to remove "WIP:" from the titles of this pull request and eclipse/omr#7438?
Sorry for the late comment. May I ask you to reformat your commit comment so that lines do not exceed 72 characters in length, per https://github.com/eclipse-openj9/openj9/blob/master/CONTRIBUTING.md#commit-guidelines?
There is also a typo in the commit comment: "differtiate" -> "differentiate"
Jenkins test sanity,sanity.openjdk all jdk8,jdk11,jdk17,jdk21
Sorry - the testing I initiated will fail because I forgot to include the depends eclipse/omr#7438 clause.
I fixed the commit message.
Jenkins test sanity,sanity.openjdk all jdk8,jdk11,jdk17,jdk21 depends eclipse/omr#7438
The Windows failure looks to be this:
[WS-CLEANUP] Deleting project workspace...
16:39:01 [WS-CLEANUP] Deferred wipeout is disabled by the job configuration...
16:39:19 ERROR: Cannot delete workspace :Unable to delete 'F:\Users\jenkins\workspace\Build_JDK21_x86-64_windows_Personal\openssl\NUL'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
I don't think that is related to my change.
- JDK 21 Windows build failure appears to be infrastructure-related
- JDK 11 x86-64 macOS sanity.functional failure appears to be infrastructure-related
- JDK 17 AIX sanity.openjdk failure appears to be infrastructure-related
- JDK 21 x86-64 macOS sanity.functional failure appears to be infrastructure-related
- JDK 21 s390x Linux sanity.openjdk test failure appears to be due to known issue #18034
- JDK 21 x86-64 macOS sanity.openjdk test failure appears to be due to known issue #18910
-
JDK 8 x86-32 Windows sanity.functional test failure for
SCURLClassLoaderTests_SE80_0andSCURLClassLoaderNPTests_SE80_0. I didn't find a known problem for this failure. - JDK 8 x86-64 Linux sanity.functional test failures appears to be due to known issue #18599
-
JDK 8 x86-64 Windows sanity.functional test failure. This also saw a failure for
SCURLClassLoaderTests_SE80_1, as well as forJ9vmTest_5,testSCCMLTests1_openj9_0,cmdLineTester_ShareClassesSimpleSanity_0. I wasn't able to find known problems for these failures in a quick search.
I will rerun testing for the infrastructure-related failures. @IBMJimmyk, may I ask you to take a closer look at the JDK 8 sanity.functional failures for Windows to verify that they are unrelated to your changes?
Jenkins test sanity,sanity.openjdk windows jdk21 depends eclipse/omr#7438
I'll take a look at the sanity.functional failures for Windows.
Jenkins test sanity,sanity.openjdk win jdk21 depends eclipse/omr#7438
Jenkins test sanity.functional xmac jdk11,jdk21 depends eclipse/omr#7438
Jenkins test sanity.openjdk aix jdk17 depends eclipse/omr#7438
Jenkins test sanity.openjdk aix jdk17 depends eclipse/omr#7438
All of the JDK8 Windows failures fall into the same pattern. The test workload itself is successful and prints the expected result for the test and shortly afterwards there is a segmentation fault in j9shr29.dll.
Here is an example from SCURLClassLoaderTests_SE80_0
[2024-09-24T02:17:44.442Z] Testing: FindStore.FindExplicit
[2024-09-24T02:17:44.442Z] Test start time: 2024/09/23 22:17:43 Eastern Standard Time
[2024-09-24T02:17:44.442Z] Running command: C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_Personal_testList_0/jdkbinary/j2sdk-image\jre\bin\java.exe -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Xshareclasses:persistent,controlDir=.,name=FindStore,noTimestampChecks -cp .\FindStore\Nothing.jar;.\FindStore\A.jar;.\FindStore\B.jar;.\FindStore\C.jar;.\FindStore\D.jar jnurlcldr.shared.findstore.A_Main
[2024-09-24T02:17:44.442Z] Time spent starting: 0 milliseconds
[2024-09-24T02:17:44.442Z] Time spent executing: 328 milliseconds
[2024-09-24T02:17:44.442Z] Test result: FAILED
[2024-09-24T02:17:44.442Z] [OUT] CACHED A
[2024-09-24T02:17:44.442Z] [OUT] CACHED B
[2024-09-24T02:17:44.442Z] [OUT] CACHED C
[2024-09-24T02:17:44.442Z] [OUT] CACHED D
[2024-09-24T02:17:44.442Z] [OUT] Result=3
[2024-09-24T02:17:44.442Z] [ERR] Unhandled exception
[2024-09-24T02:17:44.442Z] [ERR] Type=Segmentation error vmState=0x00000000
[2024-09-24T02:17:44.442Z] [ERR] J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=706141DA ContextFlags=0001007f
[2024-09-24T02:17:44.442Z] [ERR] Handler1=71625EA0 Handler2=715454D0 InaccessibleWriteAddress=22A56118
[2024-09-24T02:17:44.442Z] [ERR] EDI=013C2FC8 ESI=227E56D8 EAX=22A55FE8 EBX=2298F2B8
[2024-09-24T02:17:44.442Z] [ERR] ECX=FFFFFFFF EDX=010D2000
[2024-09-24T02:17:44.442Z] [ERR] EIP=706141DA ESP=01B4F41C EBP=22FE5538 EFLAGS=00210246
[2024-09-24T02:17:44.442Z] [ERR] GS=002B FS=0053 ES=002B DS=002B
[2024-09-24T02:17:44.442Z] [ERR] Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_Personal_testList_0\jdkbinary\j2sdk-image\jre\bin\default\j9shr29.dll
[2024-09-24T02:17:44.442Z] [ERR] Module_base_address=70610000 Offset_in_DLL=000041da
[2024-09-24T02:17:44.442Z] [ERR] Target=2_90_20240923_809 (Windows Server 2019 10.0 build 17763)
[2024-09-24T02:17:44.442Z] [ERR] CPU=x86 (4 logical CPUs) (0x3fff77000 RAM)
[2024-09-24T02:17:44.442Z] [ERR] ----------- Stack Backtrace -----------
[2024-09-24T02:17:44.442Z] [ERR] J9VMDllMain+0x31da (0x706141DA [j9shr29+0x41da])
[2024-09-24T02:17:44.442Z] [ERR] J9VMDllMain+0x9aa (0x706119AA [j9shr29+0x19aa])
[2024-09-24T02:17:44.442Z] [ERR] ---------------------------------------
[2024-09-24T02:17:44.442Z] [ERR] JVMDUMP039I Processing dump event "gpf", detail "" at 2024/09/23 22:17:43 - please wait.
[2024-09-24T02:17:44.442Z] [ERR] JVMDUMP013I Processed dump event "gpf", detail "".
[2024-09-24T02:17:44.442Z] >> Success condition was found: [Output match: Result=3]
[2024-09-24T02:17:44.442Z] >> Failure condition was not found: [Output match: LOCAL]
[2024-09-24T02:17:44.442Z] >> Failure condition was found: [Output match: Unhandled Exception]
[2024-09-24T02:17:44.442Z] >> Failure condition was not found: [Output match: Exception:]
[2024-09-24T02:17:44.442Z] >> Failure condition was not found: [Output match: Error:]
The test successfully prints out Result=3 which is the expected result of a successful test. Shortly afterwards, there is a segfault in j9shr29.dll.
I have tried to reproduce the problem but have not been successful. When the tests were run as a part of this PR, this error occurred about 12 times total. I tried rerunning the tests 5 times each with a build with and without my changes and they all passed.
I stumbled upon this similar looking problem: https://github.com/eclipse-openj9/openj9/issues/20227
In that issue, there is also a test with a successful result followed by a crash in j9shr29.dll. But, the backtrace isn't quite the same. The problem in that issue was fixed and the change merged in after this build was kicked off. So the testing of my changes does not include the fix.
I think by builds go up to here in OpenJ9: https://github.com/eclipse-openj9/openj9/commits/16ab797349b9bda5a5de2a9ff27ef2d932a4716a/ (Sept 23) So it didn't include: https://github.com/eclipse-openj9/openj9/pull/20196/commits (merged sept 24)
I was able to reproduce the JDK8 Windows failures on a build that excluded both my Unsafe CAS work and the j9shr29.dll crash fix. So I can say those failures are unrelated to my change and also already fixed in newer builds.
@hzongaro Are there any other concerns that need to be addressed?