openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

OpenJDK java/lang/Thread/jni/AttachCurrentThread/AttachTest crash vmState=0x00000000

Open pshipton opened this issue 7 months ago • 14 comments

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OMR_testList_0/236/ - rh7-390-1 jdk_lang_0 java/lang/Thread/jni/AttachCurrentThread/AttachTest.java#id1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OMR_testList_0/236/openjdk_test_output.tar.gz

07:31:46  Unhandled exception
07:31:46  Type=Segmentation error vmState=0x00000000
07:31:46  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=000000a8 Signal_Code=00000001
07:31:46  Handler1=000003FFBC0C6C18 Handler2=000003FFB7F30960 InaccessibleAddress=0000000000000000
07:31:46  gpr0=000003FF00000050 gpr1=0000000000170230 gpr2=0000000000000000 gpr3=0000000000000000
07:31:46  gpr4=00000000A42C0000 gpr5=0000000000069800 gpr6=0000000000170210 gpr7=00000000A42C0000
07:31:46  gpr8=000003FFBC31D548 gpr9=0000000000000006 gpr10=0000000000069800 gpr11=0000000000069B00
07:31:46  gpr12=00000000001C1500 gpr13=000003FFBC2E0390 gpr14=000003FFBC115C58 gpr15=000003FFBCC73A78
07:31:46  psw=000003FFBC1168B8 mask=0705100180000000 fpc=0008fe00 bea=000003FFBC11645E
07:31:46  fpr0=000003ffbcc72ba8 (f: 3167169536.000000, d: 2.172366e-311)
07:31:46  fpr1=48898ac000000000 (f: 0.000000, d: 2.781277e+41)
07:31:46  fpr2=00000000000d2200 (f: 860672.000000, d: 4.252285e-318)
07:31:46  fpr3=3c29792700120000 (f: 1179648.000000, d: 6.904539e-19)
07:31:46  fpr4=00000000000c2800 (f: 796672.000000, d: 3.936083e-318)
07:31:46  fpr5=3e924bb200000000 (f: 0.000000, d: 2.726269e-07)
07:31:46  fpr6=000000000000002d (f: 45.000000, d: 2.223295e-322)
07:31:46  fpr7=3e3a374c00000000 (f: 0.000000, d: 6.103889e-09)
07:31:46  fpr8=00000000333859e0 (f: 859331072.000000, d: 4.245659e-315)
07:31:46  fpr9=000000000057d468 (f: 5756008.000000, d: 2.843846e-317)
07:31:46  fpr10=0000000033385b30 (f: 859331392.000000, d: 4.245661e-315)
07:31:46  fpr11=000003ffa477e3d8 (f: 2759320576.000000, d: 2.172165e-311)
07:31:46  fpr12=0006326aed2fea6c (f: 3979340288.000000, d: 8.617911e-309)
07:31:46  fpr13=000003ff0c138a38 (f: 202607168.000000, d: 2.170902e-311)
07:31:46  fpr14=000003ffd03fae44 (f: 3493834240.000000, d: 2.172528e-311)
07:31:46  fpr15=000003ff0c176318 (f: 202859296.000000, d: 2.170902e-311)
07:31:46  Module=/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OMR_testList_0/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
07:31:46  Module_base_address=000003FFBC080000
07:31:46  Target=2_90_20250410_253 (Linux 3.10.0-1160.118.1.el7.s390x)
07:31:46  CPU=s390x (4 logical CPUs) (0x1ec1b1000 RAM)
07:31:46  ----------- Stack Backtrace -----------
07:31:46  walkStackFrames+0xd18 (0x000003FFBC1168B8 [libj9vm29.so+0x968b8])
07:31:46  walkStackForExceptionThrow+0x98 (0x000003FFBC0B96C8 [libj9vm29.so+0x396c8])
07:31:46  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x1d3e (0x000003FFBC14B87E [libj9vm29.so+0xcb87e])
07:31:46  bytecodeLoopCompressed+0xee (0x000003FFBC149A7E [libj9vm29.so+0xc9a7e])
07:31:46  c_cInterpreter+0x64 (0x000003FFBC23814C [libj9vm29.so+0x1b814c])
07:31:46  ---------------------------------------

pshipton avatar Apr 10 '25 18:04 pshipton

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OpenJDK21_testList_1/50 - rh7-390-3 jdk_lang_1 java/lang/Thread/jni/AttachCurrentThread/AttachTest.java#id1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OpenJDK21_testList_1/50/openjdk_test_output.tar.gz

07:36:05  Type=Segmentation error vmState=0x00000000
07:36:05  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=000000a9 Signal_Code=00000001
07:36:05  Handler1=000003FFB3A46DA0 Handler2=000003FFB3930978 InaccessibleAddress=0000000000000000
07:36:05  gpr0=000003FF8B3E7690 gpr1=000003FFB42BDA00 gpr2=000003FFAB12BE00 gpr3=0000000000000008
07:36:05  gpr4=0000000000000000 gpr5=000003FFB42BDA00 gpr6=0000000000000000 gpr7=000003FF394E7FE1
07:36:05  gpr8=000003FFB436CB68 gpr9=00000000020E0001 gpr10=000003FFAB12BE00 gpr11=000003FFB8673318
07:36:05  gpr12=000003FFAB37E7B8 gpr13=000003FFB3C634D0 gpr14=000003FFB3B1BF84 gpr15=000003FFB8672CC0
07:36:05  psw=000003FFB3B104FE mask=0705100180000000 fpc=0008fe00 bea=000003FFB3B10E8C
07:36:05  fpr0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
07:36:05  fpr1=48c7bcc000000000 (f: 0.000000, d: 4.135622e+42)
07:36:05  fpr2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
07:36:05  fpr3=3c51ffc400140000 (f: 1310720.000000, d: 3.902929e-18)
07:36:05  fpr4=48c7bcc080000000 (f: 2147483648.000000, d: 4.135623e+42)
07:36:05  fpr5=3e924d0f00000000 (f: 0.000000, d: 2.727063e-07)
07:36:05  fpr6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
07:36:05  fpr7=3e3a398400000000 (f: 0.000000, d: 6.105907e-09)
07:36:05  fpr8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
07:36:05  fpr9=000003feec309f58 (f: 3962609408.000000, d: 2.170637e-311)
07:36:05  fpr10=000000003c8f4a10 (f: 1016023552.000000, d: 5.019823e-315)
07:36:05  fpr11=000003ff998fe3d8 (f: 2576344064.000000, d: 2.172075e-311)
07:36:05  fpr12=000000003c8f4a40 (f: 1016023616.000000, d: 5.019824e-315)
07:36:05  fpr13=000003feec330ca8 (f: 3962768640.000000, d: 2.170638e-311)
07:36:05  fpr14=000003fffeffb3f4 (f: 4278170624.000000, d: 2.172915e-311)
07:36:05  fpr15=000003feec3afd58 (f: 3963288832.000000, d: 2.170638e-311)
07:36:05  Module=/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_s390x_linux_OpenJDK21_testList_1/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
07:36:05  Module_base_address=000003FFB3A00000
07:36:05  Target=2_90_20250529_54 (Linux 3.10.0-1160.118.1.el7.s390x)
07:36:05  CPU=s390x (4 logical CPUs) (0x1ec1b1000 RAM)
07:36:05  ----------- Stack Backtrace -----------
07:36:05  _ZN26VM_BytecodeInterpreterFull3runEP10J9VMThread+0x1b716 (0x000003FFB3B104FE [libj9vm29.so+0x1104fe])
07:36:05  bytecodeLoopFull+0xee (0x000003FFB3AF4D36 [libj9vm29.so+0xf4d36])
07:36:05  c_cInterpreter+0x64 (0x000003FFB3BB9D64 [libj9vm29.so+0x1b9d64])

pshipton avatar May 29 '25 15:05 pshipton

@tajila fyi

pshipton avatar May 29 '25 15:05 pshipton

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_x86-64_mac_Nightly_testList_0/424 - mac13-x86-1 jdk_lang_0 java/lang/Thread/jni/AttachCurrentThread/AttachTest.java#id1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_x86-64_mac_Nightly_testList_0/424/openjdk_test_output.tar.gz

18:58:01  Unhandled exception
18:58:01  Type=Segmentation error vmState=0x00000000
18:58:01  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
18:58:01  Handler1=0000000006AEE570 Handler2=00000000068345C0 InaccessibleAddress=0000000000000008
18:58:01  RDI=00006000020A4160 RSI=0000000000002000 RAX=0000000000000000 RBX=000000000A042500
18:58:01  RCX=00000000A42C0000 RDX=0000000000000000 R8=0000700005DCD170 R9=0000000000000004
18:58:01  R10=0000000006D48DA4 R11=00007F97832C302E R12=000000000A158950 R13=0000000000000006
18:58:01  R14=000000000A042800 R15=000000000A042860
18:58:01  RIP=0000000006B31183 GS=0000 FS=0000 RSP=0000700005DCD3E0
18:58:01  RFlags=0000000000010202 CS=002B RBP=0000700005DCD460 ERR=0000000800000004
18:58:01  TRAPNO=000000040000000E CPU=0008000000040000 FAULTVADDR=0000000000000008
18:58:01  XMM0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM1=000000000a158950 (f: 169183568.000000, d: 8.358779e-316)
18:58:01  XMM2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM4=412e848000000000 (f: 0.000000, d: 1.000000e+06)
18:58:01  XMM5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM6=3fd9c86b02dc0863 (f: 47974500.000000, d: 4.028575e-01)
18:58:01  XMM7=4023687a9f1af2b1 (f: 2669343488.000000, d: 9.704061e+00)
18:58:01  XMM8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM10=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  XMM15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:58:01  Module=/Users/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_x86-64_mac_Nightly_testList_0/jdkbinary/j2sdk-image/lib/default/libj9vm29.dylib
18:58:01  Module_base_address=0000000006AB7000 Symbol=walkStackFrames
18:58:01  Symbol_address=0000000006B30860
18:58:01  Target=2_90_20250605_512 (Mac OS X 13.7.3)
18:58:01  CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
18:58:01  ----------- Stack Backtrace -----------
18:58:01  walkStackFrames+0x924 (0x0000000006B31184 [libj9vm29.dylib+0x7a184])
18:58:01  walkStackForExceptionThrow+0x96 (0x0000000006AE3B26 [libj9vm29.dylib+0x2cb26])
18:58:01  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x20529 (0x0000000006B7B119 [libj9vm29.dylib+0xc4119])
18:58:01  bytecodeLoopCompressed+0xc8 (0x0000000006B5ABD8 [libj9vm29.dylib+0xa3bd8])

pshipton avatar Jun 06 '25 12:06 pshipton

@tajila tentatively setting as a blocker, since this crash is appearing semi frequently lately.

pshipton avatar Jun 06 '25 12:06 pshipton

@pshipton Have you seen this on a linux platform recently? I dont have access to an xmac atm.

tajila avatar Jun 11 '25 17:06 tajila

50x grinder on xlinux https://openj9-jenkins.osuosl.org/job/Grinder/4388/

tajila avatar Jun 11 '25 17:06 tajila

All the failure are documented in this issue. https://github.com/eclipse-openj9/openj9/issues/21619#issuecomment-2919713913 is on zlinux and the diagnostics are still there. Also you can get access to the xmac machines if you like.

pshipton avatar Jun 11 '25 22:06 pshipton

These grinders stopped on the first failure, there were a couple of unrelated failures. 50x grinder on xlinux and zlinux latest build - didn't reproduce 100x grinder zlinux 0.53 - didn't reproduce

pshipton avatar Jun 12 '25 20:06 pshipton

Looking at the core in https://github.com/eclipse-openj9/openj9/issues/21619#issuecomment-2949184926. DDR shows the initial stackwalker values are:

> !stackslots 0x0a042500
<a042500> *** BEGIN STACK WALK, flags = 00400001 walkThread = 168,043,776 ***
<a042500> 	ITERATE_O_SLOTS
<a042500> 	RECORD_BYTECODE_PC_OFFSET
<a042500> Initial values: walkSP = 0x000000000A158950, PC = 0x0000000000000006, literals = 0x0000000000000000, A0 = 0x000000000A158980, j2iFrame = 0x0000000000000000, decomp = 0x0000000000000000
Jun 16, 2025 11:12:28 A.M. com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x000000000A042500
com.ibm.j9ddr.NullPointerDereference: Memory Fault reading 0x00000000 

It is likely failing in the same way as the JVM stackwalker in the crash

18:58:01  ----------- Stack Backtrace -----------
18:58:01  walkStackFrames+0x924 (0x0000000006B31184 [libj9vm29.dylib+0x7a184])
18:58:01  walkStackForExceptionThrow+0x96 (0x0000000006AE3B26 [libj9vm29.dylib+0x2cb26])
18:58:01  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x20529 (0x0000000006B7B119 [libj9vm29.dylib+0xc4119])
18:58:01  bytecodeLoopCompressed+0xc8 (0x0000000006B5ABD8 [libj9vm29.dylib+0xa3bd8])

However the stack is walkable, but the initial values are wrong. I looked at the stack and manually filled in the initial values and DDR seems to walk it fine.

> !stackslots 0x0a042500,0x0A158948,0x0A158968,7,0
<a042500> *** BEGIN STACK WALK, flags = 00400001 walkThread = 168,043,776 ***
<a042500> 	ITERATE_O_SLOTS
<a042500> 	RECORD_BYTECODE_PC_OFFSET
<a042500> Initial values: walkSP = 0x000000000A158948, PC = 0x0000000000000007, literals = 0x0000000000000000, A0 = 0x000000000A158968, j2iFrame = 0x0000000000000000, decomp = 0x0000000000000000
<a042500> JNI native method frame: bp = 0x000000000A158968, sp = 0x000000000A158948, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x000000000A158968, flags = 0x0000000000000000
<a042500> 	Method: openj9/internal/foreign/abi/InternalUpcallHandler.allocateUpcallStub(Lopenj9/internal/foreign/abi/UpcallMHMetaData;[Ljava/lang/String;)J !j9method 0x000000000A2AEBB0
<a042500> 	Using signature mapper
<a042500> 	Arguments starting at 0x000000000A158968 for UDATA(0x0000000000000003) slots
<a042500> 		O-Slot: a0[0x000000000A158968] = 0x000000000A1589D8
<a042500> 		O-Slot: a1[0x000000000A158960] = 0x0000000031E16F4E
<a042500> 		O-Slot: a2[0x000000000A158958] = 0x000000000A2AEB50
<a042500> Bytecode frame: bp = 0x000000000A158998, sp = 0x000000000A158970, pc = 0x0000000031E16F4E, cp = 0x000000000A2AE3C0, arg0EA = 0x000000000A1589D8, flags = 0x0000000000000000
<a042500> 	Method: openj9/internal/foreign/abi/InternalUpcallHandler.getUpcallThunkAddr(Ljava/lang/invoke/MethodHandle;Ljava/lang/foreign/Arena;Ljdk/internal/foreign/abi/LinkerOptions;)J !j9method 0x000000000A2AEB50
<a042500> 	Bytecode index = 110
<a042500> 	Using local mapper
<a042500> 	Locals starting at 0x000000000A1589D8 for 0x0000000000000008 slots
<a042500> 		I-Slot: a0[0x000000000A1589D8] = 0x00000000FFEF5C88
<a042500> 		I-Slot: a1[0x000000000A1589D0] = 0x00000000FFEFD720
<a042500> 		I-Slot: a2[0x000000000A1589C8] = 0x00000000FFE0FA30
<a042500> 		I-Slot: a3[0x000000000A1589C0] = 0x00000000FFE53FE8
<a042500> 		I-Slot: t4[0x000000000A1589B8] = 0x0000000000000000
<a042500> 		I-Slot: t5[0x000000000A1589B0] = 0x00000000FFEFFE90
<a042500> 		I-Slot: t6[0x000000000A1589A8] = 0x0000000000000000
<a042500> 		I-Slot: t7[0x000000000A1589A0] = 0x0000000000000000
<a042500> 	Pending stack starting at 0x000000000A158980 for UDATA(0x0000000000000003) slots
<a042500> 		O-Slot: p0[0x000000000A158980] = 0x00000000FFF09218
<a042500> 		O-Slot: p1[0x000000000A158978] = 0x00000000FFF09238
<a042500> 		O-Slot: p2[0x000000000A158970] = 0x00000000FFF09250
<a042500> Bytecode frame: bp = 0x000000000A1589F8, sp = 0x000000000A1589E0, pc = 0x0000000031E16E9E, cp = 0x000000000A2AE3C0, arg0EA = 0x000000000A158A38, flags = 0x0000000000000000
<a042500> 	Method: openj9/internal/foreign/abi/InternalUpcallHandler.<init>(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;Ljava/lang/foreign/FunctionDescriptor;Ljava/lang/foreign/Arena;Ljdk/internal/foreign/abi/LinkerOptions;)V !j9method 0x000000000A2AEB10
<a042500> 	Bytecode index = 74
<a042500> 	Using local mapper
<a042500> 	Locals starting at 0x000000000A158A38 for 0x0000000000000008 slots
<a042500> 		I-Slot: a0[0x000000000A158A38] = 0x00000000FFF09218
<a042500> 		I-Slot: a1[0x000000000A158A30] = 0x00000000FFE11048
<a042500> 		I-Slot: a2[0x000000000A158A28] = 0x00000000FFE19D10
<a042500> 		I-Slot: a3[0x000000000A158A20] = 0x00000000FFE53FD8
<a042500> 		I-Slot: a4[0x000000000A158A18] = 0x00000000FFE0FA30
<a042500> 		I-Slot: a5[0x000000000A158A10] = 0x00000000FFE53FE8
<a042500> 		I-Slot: t6[0x000000000A158A08] = 0x00000000FFE068C0
<a042500> 		I-Slot: t7[0x000000000A158A00] = 0x00000000FFEFD720
<a042500> 	Pending stack starting at 0x000000000A1589E0 for UDATA(0x0000000000000001) slots
<a042500> 		O-Slot: p0[0x000000000A1589E0] = 0x00000000FFF09218
...

This is the vmThread:

> !j9vmthread 0x0a042500
J9VMThread at 0xa042500 {
  Fields for J9VMThread:
	0x0: struct JNINativeInterface_* functions = !jninativeinterface_ 0x0000000006D61E00
	0x8: struct J9JavaVM* javaVM = !j9javavm 0x00007F9787815420
	0x10: UDATA* arg0EA = !j9x 0x000000000A158980 //should be 0x0A158968
	0x18: UDATA* bytecodes = !j9x 0x0000000000000000
	0x20: UDATA* sp = !j9x 0x000000000A158950// should be 0x0A158948
	0x28: U8* pc = !j9x 0x0000000000000006 // should be 0x7 JNI callout instead of JIT JNI callout

@gacholio have you seen this type of problem before?

tajila avatar Jun 16 '25 15:06 tajila

Just confirming from tracepoints that the initial values are:

22:55:42.554883993 *0xA042500 j9vm.682             Entry      >WalkStackFrames - walkThread=0xa042500 flags=0xa42c0000 sp=0xa158950 a0=0xa158980 pc=0x6 literals=0x0 els=0x700005dcdb88 j2i=0x0

tajila avatar Jun 16 '25 15:06 tajila

At a glance, no, but based on your analysis it looks like someone may have forgotten to update the root values before starting a stack walk.

Note that the PC value of 6 in the roots is:

#define J9SF_FRAME_TYPE_JIT_JNI_CALLOUT  6

gacholio avatar Jun 16 '25 15:06 gacholio

It may be worth auditing the FFI code for missing VM struct updates.

gacholio avatar Jun 16 '25 15:06 gacholio

It's also worth noting that your "correct" values may be stale (a valid stack from a little bit in the past).

Does this fail -Xint?

gacholio avatar Jun 16 '25 16:06 gacholio

Does this fail -Xint?

Not that I know of, and its not easily reproduceable.

tajila avatar Jun 16 '25 18:06 tajila

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_x86-64_mac_OMR_testList_0/227/

pshipton avatar Jul 04 '25 21:07 pshipton

Reopening until the fix it delivered to 0.56.

pshipton avatar Sep 02 '25 11:09 pshipton