openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

jdk_lang_0 java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-noTieredCompilation j9codertvm(j9ji.110 ASSERTION FAILED at runtime/codert_vm/jswalk.c:538: ((0 ))

Open JasonFengJ9 opened this issue 5 months ago • 4 comments

Failure link

Copied from https://github.com/eclipse-openj9/openj9/issues/21665#issuecomment-2966595535

From an internal build(svlxtor3):

20:28:30  Eclipse OpenJ9 VM 24.0.1+9-202506100001 (build master-89ce909c27, JRE 24 Linux s390x-64-Bit Compressed References 20250610_79 (JIT enabled, AOT enabled)
20:28:30  OpenJ9   - 89ce909c27
20:28:30  OMR      - a9d5f8806
20:28:30  JCL      - 78e9ff37d based on jdk-24.0.1+9)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

20:28:30  Running test jdk_lang_0 ...

20:28:30  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
20:28:30  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 


20:35:01  TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-noTieredCompilation

20:35:01  00:34:57.011 0x1dca800j9codertvm(j9ji.110    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))

50x internal grinder

JasonFengJ9 avatar Jun 12 '25 21:06 JasonFengJ9

@r30shah As this one occurs on zLinux, can you assign to someone on your team for investigation? @hzongaro @0xdaryl FYI

vij-singh avatar Jun 16 '25 16:06 vij-singh

@mpirvu FYI

vij-singh avatar Jun 16 '25 16:06 vij-singh

@ehsankianifar Can you take a look at the failure ?

r30shah avatar Jun 16 '25 17:06 r30shah

thanks @r30shah, Taking a look

ehsankianifar avatar Jun 16 '25 17:06 ehsankianifar

In GC root scan, when it starts to walk the jit stack, it get a null stack map which trigger this failure. I will provide more updates as I am checking the code path.

ehsankianifar avatar Jun 19 '25 13:06 ehsankianifar

It seem we get garbage stack frame when walking jit stack which might be similar issue as https://github.com/eclipse-openj9/openj9/issues/21829 this one. I am getting a lot of segfaults when running this test and was not able to reproduce this exact error, however they all might be related. I also tried to pinpoint what jited method cause this issue by using a limit file. With compilation of recursive2 the segmentation errors begin. I update my JDK to the lates build to make sure I didn't miss the latest fixes (e.g. this) and continue testing.

ehsankianifar avatar Jun 21 '25 00:06 ehsankianifar

When running the whole Jdk_lang_0 or even MiscMonitorTests.java#Xcomp-noTieredCompilation, It often fails with segfault before reaching the testContentionMultipleMonitors2 test. I was not able to get the exact failure as described in this issue however, when updated driver and runing only testContentionMultipleMonitors2 test, I consistently get this error:

STARTED    MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
Exception in thread "VThread-35" Exception in thread "VThread-15" java.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is nulljava.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is null

	at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
	at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
	at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
	at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
	at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)
	at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)
Exception in thread "VThread-34" java.lang.NullPointerException: Cannot enter synchronized block because "this.lockArray[<local2>]" is null
	at MiscMonitorTests$TestContentionMultipleMonitors2.foo(MiscMonitorTests.java:336)
	at MiscMonitorTests$TestContentionMultipleMonitors2.lambda$runTest$0(MiscMonitorTests.java:319)
	at java.base/java.lang.VirtualThread.run(VirtualThread.java:472)

ehsankianifar avatar Jun 21 '25 18:06 ehsankianifar

in a gdb run I got this error:

Thread 31 "pool-1-thread-3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ffd627f840 (LWP 551290)]
0x000003fffd3cc724 in detachMonitorInfo (currentThread=0x4c7500, lockObject=0xffe00030, alreadyDetached=0x3ffd627d62c) at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/ContinuationHelpers.cpp:766
warning: Source file is more recent than executable.
766		if (!LN_HAS_LOCKWORD(currentThread, lockObject)) {
(gdb) bt
#0  0x000003fffd3cc724 in detachMonitorInfo (currentThread=0x4c7500, lockObject=0xffe00030, alreadyDetached=0x3ffd627d62c)
    at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/ContinuationHelpers.cpp:766
#1  0x000003fff7c5df46 in walkLiveMonitorSlotsForYield (walkState=walkState@entry=0x3ffd627da60, gcStackAtlas=gcStackAtlas@entry=0x3ffd64061a0,
    liveMonitorMap=liveMonitorMap@entry=0x3ffd640626a "\030", monitorMask=0x3ffd64061c0 "\030", numberOfMapBits=numberOfMapBits@entry=16)
    at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/codert_vm/jswalk.c:1885
#2  0x000003fff7c5fb7c in jitGetOwnedObjectMonitors (walkState=0x3ffd627da60)
    at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/codert_vm/jswalk.c:1745
#3  0x000003fffd3962da in walkFrame (walkState=0x3ffd627da60) at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/vm/swalk.c:555
#4  0x000003fff7c5e2f8 in jitWalkStackFrames (walkState=0x3ffd627da60)

the lockObject=0xffe00030 is not valid object p/x (j9object_t)*lockObject => 0x6970746f72244361! It get segfault when try to get the lock word from that wrong object. This object is calculated from the J9StackWalkState->pb + gcStackAtlas->parmBaseOffset.

ehsankianifar avatar Jun 21 '25 21:06 ehsankianifar

@ehsankianifar If you get chance, can you try the test with changes from https://github.com/r30shah/openj9/commits/proto21717 - This is the prototype fix to resolve issue when upon returning from interpreter from monitor entry, it was not restoring the GPRs, may be the issue you are seeing is resolved by this? Just a check - I am not sure if it is same or not.

I am in middle of testing changes, I will open up PR on Monday.

r30shah avatar Jun 22 '25 00:06 r30shah

@r30shah The test is passing with your changes. Thanks for sharing it.

ehsankianifar avatar Jun 23 '25 13:06 ehsankianifar

Thanks @ehsankianifar for comfirmation. I have started testing on the changes - will open up PR soon.

r30shah avatar Jun 23 '25 13:06 r30shah

#22128 has now been merged

vij-singh avatar Jul 03 '25 14:07 vij-singh

Has this been resolved now that #22128 has been merged?

vij-singh avatar Jul 08 '25 12:07 vij-singh

Has this been resolved now that #22128 has been merged?

I ran the tests a few times and confirm that the Z specific issue was resolved. there are other failures that seem to be cross platform.

ehsankianifar avatar Jul 08 '25 13:07 ehsankianifar

Any new news on this one?

vij-singh avatar Jul 17 '25 14:07 vij-singh

I think other failures that @ehsankianifar talked about in https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3049022391, is related to [1]. The problem reported in this issue is resolved, we can close this now.

[1]. https://github.com/eclipse-openj9/openj9/issues/22079#issuecomment-3084136429

r30shah avatar Jul 17 '25 15:07 r30shah

@JasonFengJ9 Can we close this one?

vij-singh avatar Jul 17 '25 15:07 vij-singh

Closing as per https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3084463783

JasonFengJ9 avatar Jul 17 '25 15:07 JasonFengJ9