openj9
openj9 copied to clipboard
Timedout at Continuation.yield() - java/lang/Thread/virtual/ThreadAPI.java
Copied from https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2916412694
https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_2/86/consoleFull
00:50:20 Running test jdk_lang_j9_1 ...
00:50:21 variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode501 -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10
00:50:21 JVM_OPTIONS: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10 -Xverbosegclog
01:38:42 TEST: java/lang/Thread/virtual/Parking.java#Xint
01:38:42 TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_2/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 986157ms).
01:38:42 --------------------------------------------------
01:38:42 Test results: passed: 925; error: 1
01:38:49 jdk_lang_j9_1_FAILED
50x internal grinder jdk_lang_j9 s390x_linux - still in progress, the timeout wasn't reproduced yet
https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/92/consoleFull
03:25:09 Running test jdk_lang_0 ...
03:25:09 variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
03:25:09 JVM_OPTIONS: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog
04:34:03 TEST: java/lang/Thread/virtual/ThreadAPI.java#default
04:34:03 TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 1023791ms).
04:34:03 --------------------------------------------------
04:46:56 Test results: passed: 919; error: 1
04:47:00 jdk_lang_0_FAILED
50x internal grinder jdk_lang ppc64_aix - still in progress, multiple timeouts have been reproduced
Copied from https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2959494567
I don't think the Parking test is related, the test is stuck yielding, not waiting on a monitor.
!continuationstack 0x000003fe80005b60 <3fe80005b60> !j9method 0x000003FF98322220 jdk/internal/vm/Continuation.yieldImpl(Z)Z <3fe80005b60> !j9method 0x000003FF983220C0 jdk/internal/vm/Continuation.yield0()Z <3fe80005b60> !j9method 0x000003FF983220A0 jdk/internal/vm/Continuation.yield(Ljdk/internal/vm/ContinuationScope;)Z <3fe80005b60> !j9method 0x000003FF98311B90 java/lang/VirtualThread.yieldContinuation()Z <3fe80005b60> !j9method 0x000003FF98311C70 java/lang/VirtualThread.park()V <3fe80005b60> !j9method 0x000003FF98337E68 java/lang/Access.parkVirtualThread()V <3fe80005b60> !j9method 0x000003FF9830B7C8 java/util/concurrent/locks/LockSupport.park()V <3fe80005b60> !j9method 0x000003FF985DA668 Parking.lambda$testPark7$0()V <3fe80005b60> !j9method 0x000003FE84470138 Parking$$Lambda/0x000000008443ebb8.run()V <3fe80005b60> !j9method 0x000003FF982F38F0 java/lang/Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V <3fe80005b60> !j9method 0x000003FF98311B30 java/lang/VirtualThread.run(Ljava/lang/Runnable;)V <3fe80005b60> !j9method 0x000003FE8441A050 java/lang/VirtualThread$VThreadContinuation$1.run()V <3fe80005b60> !j9method 0x000003FF98322060 jdk/internal/vm/Continuation.enter(Ljdk/internal/vm/Continuation;)V <3fe80005b60> JNI call-in frame <3fe80005b60> Native method frameThe ThreadAPI Timeout looks quite similar and may be related. The only difference I see is that the last waiting vthread is in a different state than CancelTimerWithContention:
!vthreads !continuationstack 0x0000010023bf4330 !j9vmcontinuation 0x0000010023bf4330 !j9object 0x00000000FFEE7960 (Continuation) !j9object 0x00000000FFEE7898 (VThread) - !j9object 0x00000000FFEE7898 ... I state = 0x0000000D (offset = 112) (java/lang/VirtualThread) Z parkPermit = 0x00000001 (offset = 160) (java/lang/VirtualThread) Z blockPermit = 0x00000000 (offset = 164) (java/lang/VirtualThread)
java/lang/Thread/virtual/Parking.java#Xint timeout
The failing job was too old and had already been deleted.
I ran a 200x grinder for Parking.java#Xint on zLinux: https://openj9-jenkins.osuosl.org/job/Grinder/4395
The timeout in Parking.java#Xint was not reproducible with the latest nightly build.
The timeout in Parking.java#Xint was not reproducible with the latest nightly build.
java/lang/Thread/virtual/Parking.java#Xint timeout wasn't seen in recent builds. It wasn't excluded.
Updated the issue title.
java/lang/Thread/virtual/ThreadAPI.java#default timeout
There are two issues.
=======
Issue 1
We likely missed a code path in VirtualThread.afterYield() due to a race between the thread transitioning to a WAIT state and lock.notify() being called:
if (notified && compareAndSetState(newState, BLOCKED)) {
if (blockPermit && compareAndSetState(BLOCKED, UNBLOCKED)) {
submitRunContinuation();
}
return;
}
The issue happens when notifyVirtualThread() is called while the virtual thread is still in TIMED_WAITING. In this case:
- The thread is removed from
waitingContinuationsand added toblockedContinuations, - But its state is never set to
BLOCKED.
Later calls to notifyVirtualThread() won’t find the thread in waitingContinuations, so:
- The thread’s state remains unchanged,
omrthread_monitor_notify(vm->blockedVirtualThreadsMutex)isn’t called.
Solution for Issue 1
In takeVirtualThreadListToUnblock(), when processing WAIT or TIMED_WAIT threads that were notified, we should set their state to BLOCKED and add them to the unblockedList.
In the test case,
synchronized (lock) {
lock.notify(); // <-- Will do nothing since the vthread has been removed from the waitingContinuations list
} <-- But, objectMonitorExit will notify blockedVirtualThreadsMutex since virtualThreadWaitCount > 0
=======
Issue 2
There’s a potential race condition with VirtualThread.afterYield():
afterYield()may update the state through the sequence:TIMED_WAITING → TIMED_WAIT → BLOCKED → UNBLOCKED- If
notifyVirtualThread()reads a stale state (TIMED_WAIT) whileafterYield()has already moved the thread toUNBLOCKED, - Then
notifyVirtualThread()may incorrectly overwrite the state:UNBLOCKED → BLOCKED
The virtual thread would never resume execution, and this would lead to a hang.
Solution for Issue 2
We should be using a compare-and-swap (CAS) when updating the virtual thread’s state to BLOCKED in notifyVirtualThread() to avoid races:
U_32 state = J9VMJAVALANGVIRTUALTHREAD_STATE(vmThread, vthread);
if ((JAVA_LANG_VIRTUALTHREAD_WAIT == state) || (JAVA_LANG_VIRTUALTHREAD_TIMED_WAIT == state)) {
// Use CAS to update state to BLOCKED
J9VMJAVALANGVIRTUALTHREAD_SET_STATE(vmThread, vthread, JAVA_LANG_VIRTUALTHREAD_BLOCKED);
// CAS ensures state is only changed if still in WAIT or TIMED_WAIT
}
=======
@theresa-m Since you're trying the solution for issue 1 to resolve #21683, could you also try the solution for issue 2?
I tested the proposed solution for the second issue and the timeout still occurs. @fengxue-IS can you double check if this looks like it would solve the problem? https://github.com/theresa-m/openj9/tree/threadapi
The code change looks correct to me, do you have the log output to see what is still causing the hang?
It looks very similar to the failure seen in https://github.com/eclipse-openj9/openj9/issues/21683 except that the state is in BLOCKED. This core failed in testGetState6, the failure is sometimes seen in testGetState7 as well. The running virtual thread is in Object.wait and the MainThread is in Unsafe.park from Thread.join. The virtual thread is in vm->blockedContinuations.
> !vthreads
!continuationstack 0x000001002505b2d0 !j9vmcontinuation 0x000001002505b2d0 !j9object 0x00000000FFE52B60 (Continuation) !j9object 0x00000000FFE52A98 (VThread) -
!continuationstack 0x0000000000000000 !j9vmcontinuation 0x0000000000000000 !j9object 0x00000000FFE37038 (Continuation) !j9object 0x0000000000000000 (VThread) - <N/A>
!continuationstack 0x0000000000000000 !j9vmcontinuation 0x0000000000000000 !j9object 0x00000000FFE13770 (Continuation) !j9object 0x0000000000000000 (VThread) - <N/A>
!continuationstack 0x0000000000000000 !j9vmcontinuation 0x0000000000000000 !j9object 0x00000000FFDEAF70 (Continuation) !j9object 0x0000000000000000 (VThread) - <N/A>
!continuationstack 0x0000000000000000 !j9vmcontinuation 0x0000000000000000 !j9object 0x00000000FFDF44B0 (Continuation) !j9object 0x00000000FFDF43D0 (VThread) -
> !j9object 0x00000000FFE52A98 (VThread)
!J9Object 0x00000000FFE52A98 {
...
I state = 0x0000000D (offset = 112) (java/lang/VirtualThread) <-- state BLOCKED
Z parkPermit = 0x00000001 (offset = 160) (java/lang/VirtualThread)
Z blockPermit = 0x00000000 (offset = 164) (java/lang/VirtualThread)
Z onWaitingList = 0x00000000 (offset = 168) (java/lang/VirtualThread)
Ljava/lang/VirtualThread; next = !fj9object 0x0 (offset = 144) (java/lang/VirtualThread)
Z notified = 0x00000001 (offset = 172) (java/lang/VirtualThread) <-- notifyVirtualThread was invoked
B timedWaitSeqNo = 0x00000000 (offset = 176) (java/lang/VirtualThread)
J timeout = 0x0000000000000000 (offset = 116) (java/lang/VirtualThread)
...
}
> !j9object 0x00000000FFE52B60 (Continuation)
!J9Object 0x00000000FFE52B60 {
struct J9Class* clazz = !j9class 0x304E1E00 // java/lang/VirtualThread$VThreadContinuation
Object flags = 0x00000000;
I lockword = 0x00000008 (offset = 0) (java/lang/Object) <hidden>
J vmRef = 0x000001002505B2D0 (offset = 4) (jdk/internal/vm/Continuation)
Ljava/lang/Thread; vthread = !fj9object 0xffe52a98 (offset = 20) (jdk/internal/vm/Continuation)
Ljava/lang/Object; blocker = !fj9object 0xe02ceb68 (offset = 24) (jdk/internal/vm/Continuation)
[Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 28) (jdk/internal/vm/Continuation)
Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xe0237430 (offset = 32) (jdk/internal/vm/Continuation)
Ljava/lang/Runnable; runnable = !fj9object 0xffe52b98 (offset = 36) (jdk/internal/vm/Continuation)
Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
Z isAccessible = 0x00000001 (offset = 44) (jdk/internal/vm/Continuation)
I continuationLink = 0xFFE37038 (offset = 48) (jdk/internal/vm/Continuation) <hidden>
}
> !fj9object 0xe02ceb68
!J9Object 0x00000000E02CEB68 {
struct J9Class* clazz = !j9class 0x30047700 // java/lang/Object
Object flags = 0x00000008;
I lockword = 0x30019D71 (offset = 0) (java/lang/Object) <hidden>
}
> !j9vmcontinuation 0x000001002505b2d0
J9VMContinuation at 0x1002505b2d0 {
Fields for J9VMContinuation:
0x0: UDATA* arg0EA = !j9x 0x0000000030606960
0x8: UDATA* bytecodes = !j9x 0x0000000000000000
0x10: UDATA* sp = !j9x 0x0000000030606920
0x18: U8* pc = !j9x 0x0000000000000003
0x20: struct J9Method* literals = !j9method 0x0000000000000000
0x28: UDATA* stackOverflowMark = !j9x 0x00000000306022D8
0x30: UDATA* stackOverflowMark2 = !j9x 0x00000000306022D8
0x38: struct J9JavaStack* stackObject = !j9javastack 0x00000000306012D0
0x40: struct J9JITDecompilationInfo* decompilationStack = !j9jitdecompilationinfo 0x0000000000000000
0x48: UDATA* j2iFrame = !j9x 0x0000000000000000
0x50: struct J9JITGPRSpillArea jitGPRs = !j9jitgprspillarea 0x000001002505B320
0x160: struct J9I2JState i2jState = !j9i2jstate 0x000001002505B430
0x180: struct J9VMEntryLocalStorage* oldEntryLocalStorage = !j9vmentrylocalstorage 0x0000000000000000
0x188: UDATA dropFlags = 0x0000000000000000 (0)
0x190: UDATA returnState = 0x0000000000000002 (2)
0x198: IDATA waitingMonitorEnterCount = 0x0000000000000001 (1)
0x1a0: UDATA ownedMonitorCount = 0x0000000000000000 (0)
0x1a8: struct J9Pool* monitorEnterRecordPool = !j9pool 0x00000100252B6870
0x1b0: struct J9MonitorEnterRecord* monitorEnterRecords = !j9monitorenterrecord 0x00000100252C1460
0x1b8: struct J9MonitorEnterRecord* jniMonitorEnterRecords = !j9monitorenterrecord 0x0000000000000000
0x1c0: struct J9Object* vthread = !j9object 0x00000000FFE52A98 // java/lang/VirtualThread
0x1c8: struct J9VMContinuation* nextWaitingContinuation = !j9vmcontinuation 0x0000000000000000
0x1d0: struct J9ObjectMonitor* objectWaitMonitor = !j9objectmonitor 0x0000000030019D70
0x1d8: struct J9ObjectMonitor* enteredMonitors = !j9objectmonitor 0x0000000000000000
0x1e0: UDATA runtimeFlags = 0x0000000000000000 (0)
0x1e8: I64 startTicks = 0x002583A11EFAB4C7 (10559302171014343)
0x1f0: struct J9VMThread* previousOwner = !j9vmthread 0x0000000000000000
}
> !j9objectmonitor 0x0000000030019D70
J9ObjectMonitor at 0x30019d70 {
Fields for J9ObjectMonitor:
0x0: struct J9ThreadMonitor* monitor = !j9threadmonitor 0x0000010023DB5FC8
0x8: UDATA proDeflationCount = 0x0000000000000015 (21)
0x10: UDATA antiDeflationCount = 0x0000000000000001 (1)
0x18: j9objectmonitor_t alternateLockword = !j9x 0x30019d88
0x20: U32 hash = 0xA9AA5132 (2846511410)
0x24: volatile U32 virtualThreadWaitCount = 0x00000001 (1)
0x28: volatile U32 platformThreadWaitCount = 0x00000000 (0)
0x30: struct J9VMContinuation* ownerContinuation = !j9vmcontinuation 0x0000000000000000
0x38: struct J9VMContinuation* waitingContinuations = !j9vmcontinuation 0x0000000000000000
0x40: struct J9ObjectMonitor* next = !j9objectmonitor 0x0000000000000000
}
Ran a 200x grinder on AIX: https://openj9-jenkins.osuosl.org/job/Grinder/4424. Also verified the test locally 150x on my xLinux machine. The timeout is no longer reproducible.
Opened https://github.com/adoptium/aqa-tests/pull/6387 to re-enable the test.
Closing as https://github.com/adoptium/aqa-tests/pull/6387 has been merged to re-enable the test.