omr-agentcore icon indicating copy to clipboard operation
omr-agentcore copied to clipboard

Intermittent z/OS crash in libhealthenter.so when stopping while invoking JNI calls

Open yathamravali opened this issue 1 year ago • 1 comments

StackTrace of abort:

27       JNIEnv_::GetStaticMethodID(_jclass*,const char*,const char*) 
                      +00000054              *PATHNAM                  
 28       ibmras::monitoring::plugins::j9::getMXBean(JNIEnv_*,_jclass* 
                      +00000250              *PATHNAM                  
 29       ibmras::monitoring::plugins::j9::cpu::CpuPlugin::pullInt()   
                      +0000080C              *PATHNAM                  
 30       ibmras::monitoring::plugins::j9::cpu::pullWrapper()          
                      +00000054              *PATHNAM                  
 31       ibmras::monitoring::agent::threads::WorkerThread::processLoop 
                      +000001EE              *PATHNAM                  
 32       ibmras::monitoring::agent::threads::WorkerThread::threadEntry 

Assembly decoding:

0x2f971cf0 {}{} +0               eb6947100024 stmg      %r6, %r9, 0x710(%r4)
0x2f971cf6 {}{} +6               a74bff00     aghi      %r4, -0x100
0x2f971cfa {}{} +10              e39004b80017 llgt      %r9, 0x4b8
0x2f971d00 {}{} +16              e39090580004 lg        %r9, 0x58(%r9)
0x2f971d06 {}{} +22              e39090100017 llgt      %r9, 0x10(%r9)
0x2f971d0c {}{} +28              e31049800024 stg       %r1, 0x980(%r4)  0x521c1ff4c0: 0x0000000000000000  
0x2f971d12 {}{} +34              e32049880024 stg       %r2, 0x988(%r4)  0x521c1ff4c8: 0x000000525ebc9530 :  00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d18 {}{} +40              e33049900024 stg       %r3, 0x990(%r4)  0x521c1ff4d0: 0x00000050332c5130 :  6765744F70657261 74696E6753797374 656D4D584265616E 00DDDDDDDDDDDDDD [ getOperatingSystemMXBean........]
0x2f971d1e {}{} +46              44009010     ex        %r0, 0x10(%r9)
0x2f971d22 {}{} +50              4400900c     ex        %r0, 0xc(%r9)
0x2f971d26 {}{} +54              e32049880004 lg        %r2, 0x988(%r4)  0x521c1ff4c8: 0x000000525ebc9530 : 00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d2c {}{} +60              e33049900004 lg        %r3, 0x990(%r4)  0x521c1ff4d0: 0x00000050332c5130  :  getOperatingSystemMXBean
0x2f971d32 {}{} +66              e30049980004 lg        %r0, 0x998(%r4)  0x521c1ff4d8: 0x000000501aca6430 :  ()Ljava/lang/management/OperatingSystemMXBean;
0x2f971d38 {}{} +72              e31049800004 lg        %r1, 0x980(%r4)  0x521c1ff4c0: 0x0000000000000000
0x2f971d3e {}{} +78              e36010000004 lg        %r6, 0(%r1)  <- This instruction ran
0x2f971d44 {} +84      e36063880004 lg   %r6, 0x388(%r6) <- This instruction didnot (Failing instruction)

jmethodID method = env->GetStaticMethodID(*mgtBean, mxb, sig); r1 was env r2 was *mgtBean (the class object java/lang/management/ManagementFactory) r3 was mxb (the method name , getOperatingSystemMXBean) r4+998 was sig (()Ljava/lang/management/OperatingSystemMXBean)

So the problem is that the env (J9VMThread) is NULL.

yathamravali avatar Feb 28 '24 05:02 yathamravali

The problem was caused by ThreadPool::stopAll destructing the WorkerThread while it was still running in processLoop. This implicitly detached the thread from VM causing aborts while shutting down.

The solution is to use call source->complete(NULL) for other platforms except windows and zos. This function already sets running=false which will cause WorkerThread::processLoop to break the next time it comes back and calls source->complete(NULL) at the end of WorkerThread::processLoop.

yathamravali avatar Feb 28 '24 05:02 yathamravali