openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

jdk24 CancelTimerWithContention timeout

Open pshipton opened this issue 7 months ago • 22 comments

This issue currently tracks the timeout observed in the CancelTimerWithContention test. None of the other tests listed below exhibited a timeout. Additional test failures may be linked here if they demonstrate similar timeout symptoms.

https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/60 timeouts 00:37:16 jdk_lang_1 - Test results: passed: 910; error: 6 00:37:16 Failed test cases: 00:37:16 TEST: java/lang/Thread/virtual/CancelTimerWithContention.java ~00:37:16 TEST: java/lang/Thread/ThreadStateTest.java~ ~00:37:16 TEST: java/lang/Thread/ThreadSleepEventThrows.java~ ~00:37:16 TEST: java/lang/ThreadGroup/BasicTests.java~ ~00:37:16 TEST: java/lang/Thread/SleepSanity.java~ ~00:37:16 TEST: jdk/internal/loader/URLClassPath/JarLoaderTest.java~

Also: java/lang/Thread/virtual/CustomScheduler.java.CustomScheduler https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_1/73/

pshipton avatar Apr 21 '25 18:04 pshipton

@theresa-m Can you find out if these timeouts are identical to https://github.com/eclipse-openj9/openj9/issues/21826?

babsingh avatar May 08 '25 15:05 babsingh

I ran jdk_lang_1 on aarch64_linux 50x and did not see any timeouts. https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/50230/

The failures are:

  • LotsOfContendedMonitorEnter related to https://github.com/eclipse-openj9/openj9/issues/21827
  • IllegalArgumentsTest https://github.com/eclipse-openj9/openj9/issues/20683

theresa-m avatar May 09 '25 14:05 theresa-m

Closing this issue based on the above grinder results, https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2866763901.

babsingh avatar May 09 '25 17:05 babsingh

@babsingh @theresa-m Are there tests we can re-enable?

tajila avatar May 12 '25 18:05 tajila

@theresa-m confirmed last week that all tests are enabled except CancelTimerWithContention which is also being tracked by #21831

babsingh avatar May 12 '25 18:05 babsingh

Seems we have a reproduction of this issue, https://github.com/eclipse-openj9/openj9/issues/21831#issuecomment-2874126242

tajila avatar May 12 '25 21:05 tajila

I think that this test was just slow since the main thread stopped on a goto bytecode. I will try again to reproduce the failure and increase the timeout.

theresa-m avatar May 13 '25 14:05 theresa-m

This issue looks different than https://github.com/eclipse-openj9/openj9/issues/21826.

The test stalls because there is one waiting virtual thread that is not being notified to wake up and is stuck on the blockedContinuations list. I think the main thread is continuously looping through each of the available locks and calling notify which is why it looks like its stopped on a goto.

theresa-m avatar May 13 '25 19:05 theresa-m

aarch64_linux jdk_lang_0,jdk_lang_1,jdk_util_0,jdk_util_1

23:34:27  openjdk version "24.0.1-internal" 2025-04-15
23:34:27  OpenJDK Runtime Environment (build 24.0.1-internal-adhoc.jenkins.BuildJDK24aarch64linuxPersonal)
23:34:27  Eclipse OpenJ9 VM (build master-cf09cea8094, JRE 24 Linux aarch64-64-Bit Compressed References 20250513_50 (JIT enabled, AOT enabled)
23:34:27  OpenJ9   - cf09cea8094
23:34:27  OMR      - b4802c56caf
23:34:27  JCL      - 2361b1060b2 based on jdk-24.0.1+9)

23:34:28  Running test jdk_lang_0 ...

23:34:28  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
23:34:28  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 


00:21:20  TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

00:21:20  TEST RESULT: Error. Program `/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 1440000ms, elapsed time including timeout handling was 1453278ms).


JasonFengJ9 avatar May 14 '25 21:05 JasonFengJ9

This issue also has a platformThreadWaitCount that looks invalid like in https://github.com/eclipse-openj9/openj9/issues/21890 fyi @fengxue-IS

Otherwise the failure looks a bit different there is only one virtual thread still waiting to be woken up when the test times out.

I tried running this with Xint and the test still fails in the same way.

> !continuationstack 0x0000fffdce876370
<fffdce876370> 	!j9method 0x000000000001E250   java/lang/Object.waitImpl(JI)V
<fffdce876370> 	!j9method 0x000000000001E230   java/lang/Object.wait(JI)V
<fffdce876370> 	!j9method 0x000000000001E210   java/lang/Object.wait(J)V
<fffdce876370> 	!j9method 0x00000000001F5888   CancelTimerWithContention.lambda$test$0(Ljava/lang/Object;Ljava/util/concurrent/atomic/AtomicInteger;)Ljava/lang/Object;
<fffdce876370> 	!j9method 0x00000000001F5AE0   CancelTimerWithContention$$Lambda/0x0000000094ef1938.call()Ljava/lang/Object;
<fffdce876370> 	!j9method 0x0000000000294D68   java/util/concurrent/FutureTask.run()V
<fffdce876370> 	!j9method 0x0000000000052BF0   java/lang/Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V
<fffdce876370> 	!j9method 0x000000000006D330   java/lang/VirtualThread.run(Ljava/lang/Runnable;)V
<fffdce876370> 	!j9method 0x00000000002A7050   java/lang/VirtualThread$VThreadContinuation$1.run()V
<fffdce876370> 	!j9method 0x000000000007D660   jdk/internal/vm/Continuation.enter(Ljdk/internal/vm/Continuation;)V
<fffdce876370> 	                        JNI call-in frame
<fffdce876370> 	                        Native method frame
> 
> !j9object 0x00000000FFE52CC8
!J9Object 0x00000000FFE52CC8 {
	struct J9Class* clazz = !j9class 0x2A6E00 // java/lang/VirtualThread$VThreadContinuation
	Object flags = 0x00000010;
	I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
	J vmRef = 0x0000FFFDCE876370 (offset = 4) (jdk/internal/vm/Continuation)
	Ljava/lang/Thread; vthread = !fj9object 0xffe52bf8 (offset = 20) (jdk/internal/vm/Continuation)
	Ljava/lang/Object; blocker = !fj9object 0xffe5b058 (offset = 24) (jdk/internal/vm/Continuation)
	[Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 28) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xe016a078 (offset = 32) (jdk/internal/vm/Continuation)
	Ljava/lang/Runnable; runnable = !fj9object 0xffe52d38 (offset = 36) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
	J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
	Z isAccessible = 0x00000001 (offset = 44) (jdk/internal/vm/Continuation)
	I continuationLink = 0x00000000 (offset = 48) (jdk/internal/vm/Continuation) <hidden>
}
> !fj9object 0xffe5b058
!J9Object 0x00000000FFE5B058 {
	struct J9Class* clazz = !j9class 0x1E300 // java/lang/Object
	Object flags = 0x0000002A;
	I lockword = 0x00013739 (offset = 0) (java/lang/Object) <hidden>
}
> !j9objectmonitor 0x00013739
J9ObjectMonitor at 0x13739 {
  Fields for J9ObjectMonitor:
	0x0: struct J9ThreadMonitor* monitor = !j9threadmonitor 0x5C0000FFFF2C11A4
	0x8: UDATA proDeflationCount = 0x00000000002D253C (2958652)
	0x10: UDATA antiDeflationCount = 0x0000000000000000 (0)
	0x18: j9objectmonitor_t alternateLockword = !j9x 0x13751
	0x20: U32 hash = 0x017C63E1 (24929249)
	0x24: volatile U32 virtualThreadWaitCount = 0x00000000 (0)
	0x28: volatile U32 platformThreadWaitCount = 0xFF9706F0 (4288087792)
	0x30: struct J9VMContinuation* ownerContinuation = !j9vmcontinuation 0x0000000000000000
	0x38: struct J9VMContinuation* waitingContinuations = !j9vmcontinuation 0x0000000000000000
	0x40: struct J9ObjectMonitor* next = !j9objectmonitor 0x0000000000000000

theresa-m avatar May 22 '25 15:05 theresa-m

is !j9threadmonitor 0x5C0000FFFF2C11A4 valid, and does it have an owner?

babsingh avatar May 22 '25 18:05 babsingh

It looks like its invalid.

> !j9objectmonitor 0x00013739
J9ObjectMonitor at 0x13739 {
  Fields for J9ObjectMonitor:
	0x0: struct J9ThreadMonitor* monitor = !j9threadmonitor 0x5C0000FFFF2C11A4
	0x8: UDATA proDeflationCount = 0x00000000002D253C (2958652)
	0x10: UDATA antiDeflationCount = 0x0000000000000000 (0)
	0x18: j9objectmonitor_t alternateLockword = !j9x 0x13751
	0x20: U32 hash = 0x017C63E1 (24929249)
	0x24: volatile U32 virtualThreadWaitCount = 0x00000000 (0)
	0x28: volatile U32 platformThreadWaitCount = 0xFF9706F0 (4288087792)
	0x30: struct J9VMContinuation* ownerContinuation = !j9vmcontinuation 0x0000000000000000
	0x38: struct J9VMContinuation* waitingContinuations = !j9vmcontinuation 0x0000000000000000
	0x40: struct J9ObjectMonitor* next = !j9objectmonitor 0x0000000000000000
}
> !j9threadmonitor 0x5C0000FFFF2C11A4
J9ThreadMonitor at 0x5c0000ffff2c11a4 {
  Fields for J9ThreadMonitor:
	0x0: U64 count = <FAULT>
	0x8: volatile struct J9Thread* owner = <FAULT>
	0x10: struct J9Thread* waiting = <FAULT>
	0x18: U64 flags = <FAULT>
	0x20: U64 userData = <FAULT>
	0x28: struct J9ThreadMonitorTracing* tracing = <FAULT>
	0x30: U8* name = !j9x <FAULT>
	0x38: U64 pinCount = <FAULT>
	0x40: U64 spinlockState = <FAULT>
	0x48: U64 spinCount1 = <FAULT>
	0x50: U64 spinCount2 = <FAULT>
	0x58: U64 spinCount3 = <FAULT>
	0x60: struct J9Thread* blocking = <FAULT>
	0x68: U64 sampleCounter = <FAULT>
	0x70: const struct J9ThreadCustomSpinOptions* customSpinOptions = <FAULT>
	0x78: volatile U64 spinThreads = <FAULT>
	0x80: pthread_mutex_t mutex = !pthread_mutex_t 0x5C0000FFFF2C1224
	0xb0: struct J9Thread* notifyAllWaiting = <FAULT>
}

theresa-m avatar May 22 '25 18:05 theresa-m

https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_2/86/consoleFull

00:50:20  Running test jdk_lang_j9_1 ...

00:50:21  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode501 -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10
00:50:21  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10 -Xverbosegclog 

01:38:42  TEST: java/lang/Thread/virtual/Parking.java#Xint

01:38:42  TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_2/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 986157ms).
01:38:42  --------------------------------------------------
01:38:42  Test results: passed: 925; error: 1
01:38:49  jdk_lang_j9_1_FAILED

java/lang/Thread/virtual/Parking.java tests were re-enabled after closing

  • https://github.com/eclipse-openj9/openj9/issues/21446

Put the failure in this timeout issue unless the investigation shows it has different underneath cause.

JasonFengJ9 avatar May 28 '25 13:05 JasonFengJ9

@theresa-m Have you tried the below debugging steps?

In CancelTimerWithContention,

  • Add debug prints around the lock.wait() call to display the lock object and confirm whether it is being resumed as expected.
  • Add similar debug prints around the lock.notify() call to verify that a notification is issued for each wait().

This will help confirm whether every thread that calls wait() is actually being notified.

        try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
            var finished = new AtomicInteger();

            for (int i = 0; i < threadCount; i++) {
                Object lock = locks[RAND.nextInt(monitorCount)];    // random lock

                executor.submit(() -> {
                    synchronized (lock) {
                        // Add debug print here to trace wait activity
                        lock.wait(Long.MAX_VALUE);
                    }
                    finished.incrementAndGet();
                    return null;
                });

                synchronized (lock) {
                    lock.notify();
                }
            }

            // notify at most one thread until all threads are finished
            while (finished.get() < threadCount) {
                for (Object lock : locks) {
                    synchronized (lock) {
                        // Add debug print here to trace notify activity
                        lock.notify();
                    }
                }
            }
        }

If notifications appear to be issued correctly, the next step is to introspect the virtual thread unblocking logic in the JVM:

  • Add debug output in notifyVirtualThread (called from lock.notify()) to ensure the waiting virtual thread is correctly registered and omrthread_monitor_notify(vm->blockedVirtualThreadsMutex) is triggered.

https://github.com/eclipse-openj9/openj9/blob/35ecdd76704e0909d18298cf0edb445bca37b575/runtime/oti/ContinuationHelpers.hpp#L345-L355

  • Confirm that the loop in takeVirtualThreadListToUnblock is re-entered and that the notified virtual thread is selected for remounting:

https://github.com/eclipse-openj9/openj9/blob/35ecdd76704e0909d18298cf0edb445bca37b575/runtime/vm/ContinuationHelpers.cpp#L1162

  • Finally, ensure that the waiting virtual thread is remounted by checking that the J9VM_CONTINUATION_RETURN_FROM_OBJECT_WAIT case in enterContinuationImpl is executed as a result of the notify:

https://github.com/eclipse-openj9/openj9/blob/35ecdd76704e0909d18298cf0edb445bca37b575/runtime/vm/BytecodeInterpreter.hpp#L5826

babsingh avatar May 30 '25 20:05 babsingh

CancelTimerWithContention is a stress test of the virtual thread timeout cancellation mechanism, particularly under conditions of contention on the timer queue. It is intended to expose potential race conditions or performance issues in how timeouts associated with Object.wait(millis) are managed and cancelled.

Using the latest system core file openjdk_test_output.tar.gz, I have included annotated jdmpview output below to assist with debugging.

Based on the jdmpview output below, notifyVirtualThread was invoked, but the virtual thread was not processed in takeVirtualThreadListToUnblock, preventing it from being mounted again.

@fengxue-IS Could you please review this comment along with https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2923385313 to check if any critical points were missed?

// There is only 1 virtual thread
> !vthreads
!continuationstack 0x0000fffefcd7a4d0 !j9vmcontinuation 0x0000fffefcd7a4d0 !j9object 0x00000000FFDC2A70 (Continuation) !j9object 0x00000000FFDC28B0 (VThread) -

> !j9object 0x00000000FFDC28B0
!J9Object 0x00000000FFDC28B0 {
	...
	I state = 0x00000012 (offset = 112) (java/lang/VirtualThread) <--- State is BLOCKING ... might be in a transient state during mount or unmount
	Z parkPermit = 0x00000000 (offset = 160) (java/lang/VirtualThread)
	Z blockPermit = 0x00000000 (offset = 164) (java/lang/VirtualThread) <--- Expected to be set in takeVirtualThreadListToUnblock, but isn’t
	...
	Z notified = 0x00000001 (offset = 172) (java/lang/VirtualThread) <---  Indicates notifyVirtualThread() was invoked
	B timedWaitSeqNo = 0x00000001 (offset = 176) (java/lang/VirtualThread)
	J timeout = 0x7FFFFFFFFFFFFFFF (offset = 116) (java/lang/VirtualThread) <--- Indicates wait(Long.MAX_VALUE) was used
	...
	Ljava/lang/Thread; carrierThread = !fj9object 0x0 (offset = 152) (java/lang/VirtualThread)

> !j9object 0x00000000FFDC2A70 (Continuation)
!J9Object 0x00000000FFDC2A70 {
	struct J9Class* clazz = !j9class 0x2F4F00 // java/lang/VirtualThread$VThreadContinuation
	Object flags = 0x00000010;
	I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
	J vmRef = 0x0000FFFEFCD7A4D0 (offset = 4) (jdk/internal/vm/Continuation)
	Ljava/lang/Thread; vthread = !fj9object 0xffdc28b0 (offset = 20) (jdk/internal/vm/Continuation)
	Ljava/lang/Object; blocker = !fj9object 0xffdc2be0 (offset = 24) (jdk/internal/vm/Continuation) <--- The object the virtual thread is waiting on
	[Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 28) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xe00e1488 (offset = 32) (jdk/internal/vm/Continuation)
	Ljava/lang/Runnable; runnable = !fj9object 0xffdc2c28 (offset = 36) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
	J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
	Z isAccessible = 0x00000001 (offset = 44) (jdk/internal/vm/Continuation)
	I continuationLink = 0x00000000 (offset = 48) (jdk/internal/vm/Continuation) <hidden>
}

> !fj9object 0xffdc2be0
!J9Object 0x00000000FFDC2BE0 { <--- The object the virtual thread is waiting on
	struct J9Class* clazz = !j9class 0x35700 // java/lang/Object
	Object flags = 0x00000012;
	I lockword = 0x00011C21 (offset = 0) (java/lang/Object) <hidden> <--- Object monitor
}

> !j9vmcontinuation 0x0000fffefcd7a4d0
J9VMContinuation at 0xfffefcd7a4d0 {
  Fields for J9VMContinuation:
	...
	0x180: UDATA returnState = 0x0000000000000002 (2)
	0x188: IDATA waitingMonitorEnterCount = 0x0000000000000001 (1) <--- Updated in enterContinuationImpl and preparePinnedVirtualThreadForUnmount
	0x190: UDATA ownedMonitorCount = 0x0000000000000000 (0)
	0x198: struct J9Pool* monitorEnterRecordPool = !j9pool 0x0000FFFEFCCC3C00
	0x1a0: struct J9MonitorEnterRecord* monitorEnterRecords = !j9monitorenterrecord 0x0000FFFEFCD3A8F0
	0x1a8: struct J9MonitorEnterRecord* jniMonitorEnterRecords = !j9monitorenterrecord 0x0000000000000000
	0x1b0: struct J9Object* vthread = !j9object 0x00000000FFDC28B0 // java/lang/VirtualThread
	0x1b8: struct J9VMContinuation* nextWaitingContinuation = !j9vmcontinuation 0x0000000000000000 <--- No virtual threads on this list
	0x1c0: struct J9ObjectMonitor* objectWaitMonitor = !j9objectmonitor 0x0000000000011C20 <--- Matches the object monitor in the above object
	0x1c8: struct J9ObjectMonitor* enteredMonitors = !j9objectmonitor 0x0000000000000000
	0x1d0: UDATA runtimeFlags = 0x0000000000000000 (0)
	0x1d8: I64 startTicks = 0x0000905B6BB6BD31 (158722323561777)
	0x1e0: struct J9VMThread* previousOwner = !j9vmthread 0x0000000000000000
}

> !j9objectmonitor 0x0000000000011C20
J9ObjectMonitor at 0x11c20 {
  Fields for J9ObjectMonitor:
	0x0: struct J9ThreadMonitor* monitor = !j9threadmonitor 0x0000FFFF98175128
	0x8: UDATA proDeflationCount = 0x0000000030FBBFA8 (821804968)
	0x10: UDATA antiDeflationCount = 0x0000000000000002 (2)
	0x18: j9objectmonitor_t alternateLockword = !j9x 0x11c38
	0x20: U32 hash = 0x154B62E1 (357262049)
	0x24: volatile U32 virtualThreadWaitCount = 0x00000001 (1) <--- Indicates a waiting virtual thread
	0x28: volatile U32 platformThreadWaitCount = 0x00301600 (3151360)
	0x30: struct J9VMContinuation* ownerContinuation = !j9vmcontinuation 0x0000000000000000
	0x38: struct J9VMContinuation* waitingContinuations = !j9vmcontinuation 0x0000000000000000
	0x40: struct J9ObjectMonitor* next = !j9objectmonitor 0x0000000000000000
}

> !j9threadmonitor 0x0000FFFF98175128
J9ThreadMonitor at 0xffff98175128 {
  Fields for J9ThreadMonitor:
	0x0: U64 count = 0x0000000000000000 (0)
	0x8: volatile struct J9Thread* owner = !j9thread 0x0000000000000000 <--- No one owns the OMR thread monitor
	0x10: struct J9Thread* waiting = !j9thread 0x0000000000000000
	...
}

babsingh avatar May 30 '25 20:05 babsingh

I didn't have much luck adding print statements in CancelTimerWithContention, the test crashed early I guess due to the extra output. I tried reproducing the test by running only the failing configuration (900 threads and 8 monitors on my last run) but the test passed.

On Friday I tried adding print statements to the vm and the output shows that a thread was only notified in notifyVirtualThread once. I will add additional statements to ensure the thread was registered correctly.

theresa-m avatar Jun 02 '25 17:06 theresa-m

https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/92/consoleFull

03:25:09  Running test jdk_lang_0 ...

03:25:09  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
03:25:09  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 

04:34:03  TEST: java/lang/Thread/virtual/ThreadAPI.java#default

04:34:03  TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 1023791ms).
04:34:03  --------------------------------------------------
04:46:56  Test results: passed: 919; error: 1

04:47:00  jdk_lang_0_FAILED

The test was re-enabled yesterday, put this timeout here unless the investigation shows it has different underneath cause.

JasonFengJ9 avatar Jun 04 '25 13:06 JasonFengJ9

When adding print statements to the vm I can see that notifyVirtualThread is repeatedly called for the virtual thread 0xfffbd52fcae0 waiting on object monitor 0x13318 and blockedVirtualThreadsMutex is notified. Perhaps there is an issue with the state of the thread because the unblocker thread adds 0xfffbd52fcae0 back to the waitingContinuations list of the object monitor. I'm adding more print statements to find out what the state is there.

One thing that looks a little strange to me is that the virtual thread's blocker field shows a different object monitor address 0x13319.

> !vthreads
!continuationstack 0x0000fffbd52fcae0 !j9vmcontinuation 0x0000fffbd52fcae0 !j9object 0x00000000FFF74478 (Continuation) !j9object 0x00000000FFF73F60 (VThread) - 
> !j9vmcontinuation 0x0000fffbd52fcae0
J9VMContinuation at 0xfffbd52fcae0 {
  Fields for J9VMContinuation:
	0x0: UDATA* arg0EA = !j9x 0x00000000006E8BE8
	0x8: UDATA* bytecodes = !j9x 0x0000000000000000
	0x10: UDATA* sp = !j9x 0x00000000006E8BA8
	0x18: U8* pc = !j9x 0x0000000000000003
	0x20: struct J9Method* literals = !j9method 0x0000000000000000
	0x28: UDATA* stackOverflowMark = !j9x 0x00000000006E8630
	0x30: UDATA* stackOverflowMark2 = !j9x 0x00000000006E8630
	0x38: struct J9JavaStack* stackObject = !j9javastack 0x00000000006E7628
	0x40: struct J9JITDecompilationInfo* decompilationStack = !j9jitdecompilationinfo 0x0000000000000000
	0x48: UDATA* j2iFrame = !j9x 0x0000000000000000
	0x50: struct J9JITGPRSpillArea jitGPRs = !j9jitgprspillarea 0x0000FFFBD52FCB30
	0x150: struct J9I2JState i2jState = !j9i2jstate 0x0000FFFBD52FCC30
	0x170: struct J9VMEntryLocalStorage* oldEntryLocalStorage = !j9vmentrylocalstorage 0x0000000000000000
	0x178: UDATA dropFlags = 0x0000000000000000 (0)
	0x180: UDATA returnState = 0x0000000000000002 (2)
	0x188: IDATA waitingMonitorEnterCount = 0x0000000000000001 (1)
	0x190: UDATA ownedMonitorCount = 0x0000000000000000 (0)
	0x198: struct J9Pool* monitorEnterRecordPool = !j9pool 0x0000FFFBD52FDE50
	0x1a0: struct J9MonitorEnterRecord* monitorEnterRecords = !j9monitorenterrecord 0x0000FFFBD52FDF90
	0x1a8: struct J9MonitorEnterRecord* jniMonitorEnterRecords = !j9monitorenterrecord 0x0000000000000000
	0x1b0: struct J9Object* vthread = !j9object 0x00000000FFF73F60 // java/lang/VirtualThread
	0x1b8: struct J9VMContinuation* nextWaitingContinuation = !j9vmcontinuation 0x0000000000000000
	0x1c0: struct J9ObjectMonitor* objectWaitMonitor = !j9objectmonitor 0x0000000000013318
	0x1c8: struct J9ObjectMonitor* enteredMonitors = !j9objectmonitor 0x0000000000000000
	0x1d0: UDATA runtimeFlags = 0x0000000000000000 (0)
	0x1d8: I64 startTicks = 0x0000B52C517BA8C1 (199201950247105)
	0x1e0: struct J9VMThread* previousOwner = !j9vmthread 0x0000000000000000
}
> !j9object 0x00000000FFF74478
!J9Object 0x00000000FFF74478 {
	struct J9Class* clazz = !j9class 0x2F6000 // java/lang/VirtualThread$VThreadContinuation
	Object flags = 0x00000010;
	I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
	J vmRef = 0x0000FFFBD52FCAE0 (offset = 4) (jdk/internal/vm/Continuation)
	Ljava/lang/Thread; vthread = !fj9object 0xfff73f60 (offset = 20) (jdk/internal/vm/Continuation)
	Ljava/lang/Object; blocker = !fj9object 0xfff7ba00 (offset = 24) (jdk/internal/vm/Continuation)
	[Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 28) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xe01bd580 (offset = 32) (jdk/internal/vm/Continuation)
	Ljava/lang/Runnable; runnable = !fj9object 0xfff74750 (offset = 36) (jdk/internal/vm/Continuation)
	Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
	J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
	Z isAccessible = 0x00000001 (offset = 44) (jdk/internal/vm/Continuation)
	I continuationLink = 0x00000000 (offset = 48) (jdk/internal/vm/Continuation) <hidden>
}
> !fj9object 0xfff7ba00
!J9Object 0x00000000FFF7BA00 {
	struct J9Class* clazz = !j9class 0x35700 // java/lang/Object
	Object flags = 0x00000012;
	I lockword = 0x00013319 (offset = 0) (java/lang/Object) <hidden>
}
> 

theresa-m avatar Jun 04 '25 19:06 theresa-m

@JasonFengJ9 can you open seperate issues for https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2916412694 and https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2939999526 so they don't get missed?

theresa-m avatar Jun 09 '25 19:06 theresa-m

can you open seperate issues for https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2916412694 java/lang/Thread/virtual/Parking.java#Xint https://github.com/eclipse-openj9/openj9/issues/21683#issuecomment-2939999526 java/lang/Thread/virtual/ThreadAPI.java#default

Sure, was there any investigation showing these two timeout with different underneath causes?

JasonFengJ9 avatar Jun 09 '25 20:06 JasonFengJ9

I don't think the Parking test is related, the test is stuck yielding, not waiting on a monitor.

> !continuationstack 0x000003fe80005b60
<3fe80005b60> 	!j9method 0x000003FF98322220   jdk/internal/vm/Continuation.yieldImpl(Z)Z
<3fe80005b60> 	!j9method 0x000003FF983220C0   jdk/internal/vm/Continuation.yield0()Z
<3fe80005b60> 	!j9method 0x000003FF983220A0   jdk/internal/vm/Continuation.yield(Ljdk/internal/vm/ContinuationScope;)Z
<3fe80005b60> 	!j9method 0x000003FF98311B90   java/lang/VirtualThread.yieldContinuation()Z
<3fe80005b60> 	!j9method 0x000003FF98311C70   java/lang/VirtualThread.park()V
<3fe80005b60> 	!j9method 0x000003FF98337E68   java/lang/Access.parkVirtualThread()V
<3fe80005b60> 	!j9method 0x000003FF9830B7C8   java/util/concurrent/locks/LockSupport.park()V
<3fe80005b60> 	!j9method 0x000003FF985DA668   Parking.lambda$testPark7$0()V
<3fe80005b60> 	!j9method 0x000003FE84470138   Parking$$Lambda/0x000000008443ebb8.run()V
<3fe80005b60> 	!j9method 0x000003FF982F38F0   java/lang/Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V
<3fe80005b60> 	!j9method 0x000003FF98311B30   java/lang/VirtualThread.run(Ljava/lang/Runnable;)V
<3fe80005b60> 	!j9method 0x000003FE8441A050   java/lang/VirtualThread$VThreadContinuation$1.run()V
<3fe80005b60> 	!j9method 0x000003FF98322060   jdk/internal/vm/Continuation.enter(Ljdk/internal/vm/Continuation;)V
<3fe80005b60> 	                        JNI call-in frame
<3fe80005b60> 	                        Native method frame
> 

The ThreadAPI Timeout looks quite similar and may be related. The only difference I see is that the last waiting vthread is in a different state than CancelTimerWithContention:

> !vthreads
!continuationstack 0x0000010023bf4330 !j9vmcontinuation 0x0000010023bf4330 !j9object 0x00000000FFEE7960 (Continuation) !j9object 0x00000000FFEE7898 (VThread) -
> !j9object 0x00000000FFEE7898
...
I state = 0x0000000D (offset = 112) (java/lang/VirtualThread)
Z parkPermit = 0x00000001 (offset = 160) (java/lang/VirtualThread)
Z blockPermit = 0x00000000 (offset = 164) (java/lang/VirtualThread)

theresa-m avatar Jun 10 '25 14:06 theresa-m

Thanks @theresa-m , those two timeouts are tracked at

  • https://github.com/eclipse-openj9/openj9/issues/22070

JasonFengJ9 avatar Jun 10 '25 15:06 JasonFengJ9

https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_0/104/consoleFull

20:59:15  Running test jdk_lang_0 ...

20:59:15  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
20:59:15  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 

21:21:21  TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

21:21:21  TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 964035ms).

50x grinder - https://openj9-jenkins.osuosl.org/job/Grinder/4421/

https://openj9-jenkins.osuosl.org/job/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_1/103/consoleFull

03:47:52  Running test jdk_lang_1 ...

03:47:52  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650

04:24:57  TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

04:24:57  TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 989441ms).

@theresa-m please determine if this is a new issue and should be tracked separately.

JasonFengJ9 avatar Jun 20 '25 12:06 JasonFengJ9

I think it could be related. From the core there are no virtual threads running however finished counter only updates to 599 when 600 is expected.

theresa-m avatar Jun 20 '25 15:06 theresa-m

openjdk24_j9_sanity.openjdk_x86-64_linux

[2025-06-21T17:07:00.480Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650
[2025-06-21T17:07:00.480Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-

[2025-06-21T17:35:29.286Z] TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

[2025-06-21T17:35:29.287Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk24_j9_sanity.openjdk_x86-64_linux_testList_0/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 966134ms).
[2025-06-21T17:35:29.287Z] --------------------------------------------------
[2025-06-21T17:38:34.273Z] Test results: passed: 931; error: 1

[2025-06-21T17:38:43.208Z] jdk_lang_1_FAILED

JasonFengJ9 avatar Jun 23 '25 17:06 JasonFengJ9

Disabled test java/lang/Thread/virtual/CancelTimerWithContention.java

21:17:52  openjdk version "24.0.1-beta" 2025-04-15
21:17:52  IBM Semeru Runtime Open Edition 24.0.1+9-202507051603 (build 24.0.1-beta+9-202507051603)
21:17:52  Eclipse OpenJ9 VM 24.0.1+9-202507051603 (build master-f5aaff4096, JRE 24 Linux amd64-64-Bit Compressed References 20250705_103 (JIT enabled, AOT enabled)
21:17:52  OpenJ9   - f5aaff4096
21:17:52  OMR      - 2cadecaf5
21:17:52  JCL      - 052d7bb94 based on jdk-24.0.1+9)

21:17:52  Running test serviceability_jvmti_j9_0 ...

21:17:52  variation: Mode150
21:17:52  JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

21:58:20  TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

21:58:20  TEST RESULT: Error. Program `/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 974357ms).

JasonFengJ9 avatar Jul 08 '25 18:07 JasonFengJ9

@theresa-m Is this issue reproduceable with -Xjit:disableDynamicLoopTransfer ?

tajila avatar Jul 16 '25 21:07 tajila

-Xjit:disableDynamicLoopTransfer w/ disabled java/lang/Thread/virtual/CancelTimerWithContention.java

06:50:18  openjdk version "24.0.2-beta" 2025-07-15
06:50:18  IBM Semeru Runtime Open Edition 24.0.2+12-202507170003 (build 24.0.2-beta+12-202507170003)
06:50:18  Eclipse OpenJ9 VM 24.0.2+12-202507170003 (build master-5d5ec616c4, JRE 24 Linux ppc64le-64-Bit Compressed References 20250717_105 (JIT enabled, AOT enabled)
06:50:18  OpenJ9   - 5d5ec616c4
06:50:18  OMR      - 541548d77
06:50:18  JCL      - 0e9c99f2d based on jdk-24.0.2+12)

08:12:16  Running test jdk_lang_j9_1 ...
08:12:16  ===============================================
08:12:16  jdk_lang_j9_1 Start Time: Thu Jul 17 05:12:06 2025 Epoch Time (ms): 1752754326938
08:12:16  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode501 -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10
08:12:16  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10 -Xverbosegclog -Xjit:disableDynamicLoopTransfer

09:07:21  TEST: java/lang/Thread/virtual/CancelTimerWithContention.java

09:07:21  TEST RESULT: Error. Program `/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 966768ms).

JasonFengJ9 avatar Jul 17 '25 13:07 JasonFengJ9

I have one run that completed successfully with -Xjit:disableDynamicLoopTransfer and a longer than default timeout. Waiting for the others to complete. https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/52855

update: the other two runs failed

theresa-m avatar Jul 17 '25 16:07 theresa-m

The core file for this issue is in https://na.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Grinder_iteration_1/5898/openjdk_test_output.tar.gz. This was run without the -Xjit:disableDynamicLoopTransfer option. Fyi @fengxue-IS

theresa-m avatar Jul 17 '25 18:07 theresa-m