openj9
openj9 copied to clipboard
jdk_lang_0 java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-noTieredCompilation j9codertvm(j9ji.110 ASSERTION FAILED at runtime/codert_vm/jswalk.c:538: ((0 ))
Failure link
Copied from https://github.com/eclipse-openj9/openj9/issues/21665#issuecomment-2966595535
From an internal build(svlxtor3):
20:28:30 Eclipse OpenJ9 VM 24.0.1+9-202506100001 (build master-89ce909c27, JRE 24 Linux s390x-64-Bit Compressed References 20250610_79 (JIT enabled, AOT enabled)
20:28:30 OpenJ9 - 89ce909c27
20:28:30 OMR - a9d5f8806
20:28:30 JCL - 78e9ff37d based on jdk-24.0.1+9)
Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
20:28:30 Running test jdk_lang_0 ...
20:28:30 variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
20:28:30 JVM_OPTIONS: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog
20:35:01 TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-noTieredCompilation
20:35:01 00:34:57.011 0x1dca800j9codertvm(j9ji.110 * ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))
@r30shah As this one occurs on zLinux, can you assign to someone on your team for investigation? @hzongaro @0xdaryl FYI
@mpirvu FYI
@ehsankianifar Can you take a look at the failure ?
thanks @r30shah, Taking a look
In GC root scan, when it starts to walk the jit stack, it get a null stack map which trigger this failure. I will provide more updates as I am checking the code path.
It seem we get garbage stack frame when walking jit stack which might be similar issue as https://github.com/eclipse-openj9/openj9/issues/21829 this one. I am getting a lot of segfaults when running this test and was not able to reproduce this exact error, however they all might be related. I also tried to pinpoint what jited method cause this issue by using a limit file. With compilation of recursive2 the segmentation errors begin. I update my JDK to the lates build to make sure I didn't miss the latest fixes (e.g. this) and continue testing.
When running the whole Jdk_lang_0 or even MiscMonitorTests.java#Xcomp-noTieredCompilation, It often fails with segfault before reaching the testContentionMultipleMonitors2 test. I was not able to get the exact failure as described in this issue however, when updated driver and runing only testContentionMultipleMonitors2 test, I consistently get this error:
STARTED MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
Exception in thread "VThread-35" Exception in thread "VThread-15" java.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is nulljava.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is null
at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)
Exception in thread "VThread-34" java.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is null
at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)
in a gdb run I got this error:
Thread 31 "pool-1-thread-3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ffd627f840 (LWP 551290)]
0x000003fffd3cc724 in detachMonitorInfo (currentThread=0x4c7500, lockObject=0xffe00030, alreadyDetached=0x3ffd627d62c) at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/ContinuationHelpers.cpp:766
warning: Source file is more recent than executable.
766 if (!LN_HAS_LOCKWORD(currentThread, lockObject)) {
(gdb) bt
#0 0x000003fffd3cc724 in detachMonitorInfo (currentThread=0x4c7500, lockObject=0xffe00030, alreadyDetached=0x3ffd627d62c)
at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/ContinuationHelpers.cpp:766
#1 0x000003fff7c5df46 in walkLiveMonitorSlotsForYield (walkState=walkState@entry=0x3ffd627da60, gcStackAtlas=gcStackAtlas@entry=0x3ffd64061a0,
liveMonitorMap=liveMonitorMap@entry=0x3ffd640626a "\030", monitorMask=0x3ffd64061c0 "\030", numberOfMapBits=numberOfMapBits@entry=16)
at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/codert_vm/jswalk.c:1885
#2 0x000003fff7c5fb7c in jitGetOwnedObjectMonitors (walkState=0x3ffd627da60)
at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/codert_vm/jswalk.c:1745
#3 0x000003fffd3962da in walkFrame (walkState=0x3ffd627da60) at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/swalk.c:555
#4 0x000003fff7c5e2f8 in jitWalkStackFrames (walkState=0x3ffd627da60)
the lockObject=0xffe00030 is not valid object p/x (j9object_t)*lockObject => 0x6970746f72244361!
It get segfault when try to get the lock word
from that wrong object.
This object is calculated from the J9StackWalkState->pb + gcStackAtlas->parmBaseOffset.
@ehsankianifar If you get chance, can you try the test with changes from https://github.com/r30shah/openj9/commits/proto21717 - This is the prototype fix to resolve issue when upon returning from interpreter from monitor entry, it was not restoring the GPRs, may be the issue you are seeing is resolved by this? Just a check - I am not sure if it is same or not.
I am in middle of testing changes, I will open up PR on Monday.
@r30shah The test is passing with your changes. Thanks for sharing it.
Thanks @ehsankianifar for comfirmation. I have started testing on the changes - will open up PR soon.
#22128 has now been merged
Has this been resolved now that #22128 has been merged?
Has this been resolved now that #22128 has been merged?
I ran the tests a few times and confirm that the Z specific issue was resolved. there are other failures that seem to be cross platform.
Any new news on this one?
I think other failures that @ehsankianifar talked about in https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3049022391, is related to [1]. The problem reported in this issue is resolved, we can close this now.
[1]. https://github.com/eclipse-openj9/openj9/issues/22079#issuecomment-3084136429
@JasonFengJ9 Can we close this one?
Closing as per https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3084463783