omr-agentcore
omr-agentcore copied to clipboard
Intermittent z/OS crash in libhealthenter.so when stopping while invoking JNI calls
StackTrace of abort:
27 JNIEnv_::GetStaticMethodID(_jclass*,const char*,const char*)
+00000054 *PATHNAM
28 ibmras::monitoring::plugins::j9::getMXBean(JNIEnv_*,_jclass*
+00000250 *PATHNAM
29 ibmras::monitoring::plugins::j9::cpu::CpuPlugin::pullInt()
+0000080C *PATHNAM
30 ibmras::monitoring::plugins::j9::cpu::pullWrapper()
+00000054 *PATHNAM
31 ibmras::monitoring::agent::threads::WorkerThread::processLoop
+000001EE *PATHNAM
32 ibmras::monitoring::agent::threads::WorkerThread::threadEntry
Assembly decoding:
0x2f971cf0 {}{} +0 eb6947100024 stmg %r6, %r9, 0x710(%r4)
0x2f971cf6 {}{} +6 a74bff00 aghi %r4, -0x100
0x2f971cfa {}{} +10 e39004b80017 llgt %r9, 0x4b8
0x2f971d00 {}{} +16 e39090580004 lg %r9, 0x58(%r9)
0x2f971d06 {}{} +22 e39090100017 llgt %r9, 0x10(%r9)
0x2f971d0c {}{} +28 e31049800024 stg %r1, 0x980(%r4) 0x521c1ff4c0: 0x0000000000000000
0x2f971d12 {}{} +34 e32049880024 stg %r2, 0x988(%r4) 0x521c1ff4c8: 0x000000525ebc9530 : 00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d18 {}{} +40 e33049900024 stg %r3, 0x990(%r4) 0x521c1ff4d0: 0x00000050332c5130 : 6765744F70657261 74696E6753797374 656D4D584265616E 00DDDDDDDDDDDDDD [ getOperatingSystemMXBean........]
0x2f971d1e {}{} +46 44009010 ex %r0, 0x10(%r9)
0x2f971d22 {}{} +50 4400900c ex %r0, 0xc(%r9)
0x2f971d26 {}{} +54 e32049880004 lg %r2, 0x988(%r4) 0x521c1ff4c8: 0x000000525ebc9530 : 00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d2c {}{} +60 e33049900004 lg %r3, 0x990(%r4) 0x521c1ff4d0: 0x00000050332c5130 : getOperatingSystemMXBean
0x2f971d32 {}{} +66 e30049980004 lg %r0, 0x998(%r4) 0x521c1ff4d8: 0x000000501aca6430 : ()Ljava/lang/management/OperatingSystemMXBean;
0x2f971d38 {}{} +72 e31049800004 lg %r1, 0x980(%r4) 0x521c1ff4c0: 0x0000000000000000
0x2f971d3e {}{} +78 e36010000004 lg %r6, 0(%r1) <- This instruction ran
0x2f971d44 {} +84 e36063880004 lg %r6, 0x388(%r6) <- This instruction didnot (Failing instruction)
jmethodID method = env->GetStaticMethodID(*mgtBean, mxb, sig); r1 was env r2 was *mgtBean (the class object java/lang/management/ManagementFactory) r3 was mxb (the method name , getOperatingSystemMXBean) r4+998 was sig (()Ljava/lang/management/OperatingSystemMXBean)
So the problem is that the env (J9VMThread) is NULL.
The problem was caused by ThreadPool::stopAll destructing the WorkerThread while it was still running in processLoop. This implicitly detached the thread from VM causing aborts while shutting down.
The solution is to use call source->complete(NULL) for other platforms except windows and zos. This function already sets running=false which will cause WorkerThread::processLoop to break the next time it comes back and calls source->complete(NULL) at the end of WorkerThread::processLoop.