openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

jdk_lang_0_FAILED Segmentation error vmState=0x00000000 at detachMonitorInfo

Open JasonFengJ9 opened this issue 6 months ago • 9 comments

Failure link

From running disabled tests(rtj-ubu24s390x-svl-test-49wim-1):

16:30:39  openjdk version "24.0.1-beta" 2025-04-15
16:30:39  IBM Semeru Runtime Open Edition 24.0.1+9-202506040003 (build 24.0.1-beta+9-202506040003)
16:30:39  Eclipse OpenJ9 VM 24.0.1+9-202506040003 (build master-ee99618777, JRE 24 Linux s390x-64-Bit Compressed References 20250604_75 (JIT enabled, AOT enabled)
16:30:39  OpenJ9   - ee99618777
16:30:39  OMR      - 556e0fe4b
16:30:39  JCL      - 801da8362 based on jdk-24.0.1+9)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

16:36:02  Running test jdk_lang_0 ...

16:36:02  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
16:36:02  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 

16:55:19  TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-TieredStopAtLevel3

16:55:19  STARTED    MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
16:55:19  Unhandled exception
16:55:19  Type=Segmentation error vmState=0x00000000

16:55:19  Module=/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
16:55:19  Module_base_address=000003FF83D00000
16:55:19  Target=2_90_20250604_75 (Linux 6.8.0-59-generic)
16:55:19  CPU=s390x (4 logical CPUs) (0x1f5785000 RAM)
16:55:19  ----------- Stack Backtrace -----------
16:55:19  detachMonitorInfo+0x32 (0x000003FF83DCBD42 [libj9vm29.so+0xcbd42])
16:55:19  walkLiveMonitorSlotsForYield+0x120 (0x000003FF8285D990 [libj9jit29.so+0xc5d990])
16:55:19  jitGetOwnedObjectMonitors+0x254 (0x000003FF8285F5B4 [libj9jit29.so+0xc5f5b4])
16:55:19  walkFrame+0x222 (0x000003FF83D95EEA [libj9vm29.so+0x95eea])
16:55:19  jitWalkStackFrames+0x198 (0x000003FF8285DD30 [libj9jit29.so+0xc5dd30])
16:55:19  walkStackFrames+0xf8 (0x000003FF83D96820 [libj9vm29.so+0x96820])
16:55:19  preparePinnedVirtualThreadForUnmount+0xe8 (0x000003FF83DCCE08 [libj9vm29.so+0xcce08])
16:55:19  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x28d86 (0x000003FF83DF6C86 [libj9vm29.so+0xf6c86])
16:55:19  bytecodeLoopCompressed+0xee (0x000003FF83DCDE3E [libj9vm29.so+0xcde3e])
16:55:19  c_cInterpreter+0x64 (0x000003FF83EE633C [libj9vm29.so+0x1e633c])
16:55:19  ---------------------------------------

17:27:53  jdk_lang_0_FAILED

Another segmentation error with similar native stacktrace was reported at https://github.com/eclipse-openj9/openj9/issues/21826#issuecomment-2902644505 in which java/lang/Thread/virtual/stress/LotsOfContendedMonitorEnter.java also failed with timeout.

JasonFengJ9 avatar Jun 05 '25 18:06 JasonFengJ9

@JasonFengJ9 Anymore recent failures?

tajila avatar Jun 09 '25 14:06 tajila

Launched disabled test jdk_lang_j9_0,jdk_lang_0 on s390x_linux

The failure was reproduced - https://openj9-jenkins.osuosl.org/job/Grinder_iteration_0/665/consoleFull

10:23:09  openjdk version "24.0.1-internal" 2025-04-15
10:23:09  OpenJDK Runtime Environment (build 24.0.1-internal-adhoc.****.BuildJDK24s390xlinuxNightly)
10:23:09  Eclipse OpenJ9 VM (build master-06451debda1, JRE 24 Linux s390x-64-Bit Compressed References 20250606_98 (JIT enabled, AOT enabled)
10:23:09  OpenJ9   - 06451debda1
10:23:09  OMR      - 7a153b417a2
10:23:09  JCL      - bb8798e5c37 based on jdk-24.0.1+9)

11:01:09  TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp

11:01:09  STARTED    MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
11:01:09  Unhandled exception
11:01:09  Type=Segmentation error vmState=0x00000000
11:01:09  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
11:01:09  Handler1=000003FFB85C70F8 Handler2=000003FFB84B0978 InaccessibleAddress=01F0A101E0056000
11:01:09  gpr0=0000000000000002 gpr1=01F0A100E0144BF0 gpr2=000000000230DD00 gpr3=00000000FFF11880
11:01:09  gpr4=000003FFB35FD6FC gpr5=000003FF94BC5EF4 gpr6=0000000000000010 gpr7=000003FF94BC5EF4
11:01:09  gpr8=00000000FFF11880 gpr9=000003FFB35FD6FC gpr10=000000000230DD00 gpr11=0000000000000003
11:01:09  gpr12=000003FF981294C8 gpr13=000003FF00000001 gpr14=000003FFB325D126 gpr15=000003FFB35FD550
11:01:09  psw=000003FFB864BD46 mask=0705200180000000 fpc=00080000 bea=000003FFB325D124
11:01:09  fpr0=3f9ec949a59d38a0 (f: 2778544384.000000, d: 3.006473e-02)
11:01:09  fpr1=48364300b35fdad0 (f: 3009403648.000000, d: 7.575274e+39)
11:01:09  fpr2=000003ffb401eac4 (f: 3020024576.000000, d: 2.172294e-311)
11:01:09  fpr3=4005555555555555 (f: 1431655808.000000, d: 2.666667e+00)
11:01:09  fpr4=0000000000000007 (f: 7.000000, d: 3.458460e-323)
11:01:09  fpr5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
11:01:09  fpr6=000003ff20000928 (f: 536873280.000000, d: 2.171067e-311)
11:01:09  fpr7=0000000000000000 (f: 0.000000, d: 0.000000e+00)
11:01:09  fpr8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
11:01:09  fpr9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
11:01:09  fpr10=000003ffb35bf000 (f: 3009146880.000000, d: 2.172288e-311)
11:01:09  fpr11=000003ffa807e3b8 (f: 2819089408.000000, d: 2.172195e-311)
11:01:09  fpr12=0000000001ebd140 (f: 32231744.000000, d: 1.592460e-316)
11:01:09  fpr13=000003ff080390c8 (f: 134451392.000000, d: 2.170868e-311)
11:01:09  fpr14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
11:01:09  fpr15=000003ff080b47d8 (f: 134957024.000000, d: 2.170868e-311)
11:01:09  Module=/home/jenkins/workspace/Grinder_iteration_0/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
11:01:09  Module_base_address=000003FFB8580000
11:01:09  Target=2_90_20250606_98 (Linux 5.4.0-181-generic)
11:01:09  CPU=s390x (4 logical CPUs) (0x1f58f2000 RAM)
11:01:09  ----------- Stack Backtrace -----------
11:01:09  detachMonitorInfo+0x36 (0x000003FFB864BD46 [libj9vm29.so+0xcbd46])
11:01:09  walkLiveMonitorSlotsForYield+0x13e (0x000003FFB325D126 [libj9jit29.so+0xc5d126])
11:01:09  jitGetOwnedObjectMonitors+0x254 (0x000003FFB325ED5C [libj9jit29.so+0xc5ed5c])
11:01:09  walkFrame+0x222 (0x000003FFB8615EEA [libj9vm29.so+0x95eea])
11:01:09  jitWalkStackFrames+0x198 (0x000003FFB325D4D8 [libj9jit29.so+0xc5d4d8])
11:01:09  walkStackFrames+0xf8 (0x000003FFB8616820 [libj9vm29.so+0x96820])
11:01:09  preparePinnedVirtualThreadForUnmount+0xe8 (0x000003FFB864D1D0 [libj9vm29.so+0xcd1d0])
11:01:09  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x28d86 (0x000003FFB867704E [libj9vm29.so+0xf704e])
11:01:09  bytecodeLoopCompressed+0xee (0x000003FFB864E206 [libj9vm29.so+0xce206])
11:01:09  c_cInterpreter+0x64 (0x000003FFB8766704 [libj9vm29.so+0x1e6704])
11:01:09  ---------------------------------------

JasonFengJ9 avatar Jun 09 '25 15:06 JasonFengJ9

disassembly of crash

based on !gpinfo

J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=000003FF83D470F8 Handler2=000003FF88230978 InaccessibleAddress=0188C400FFF3E000
gpr0=0000000000000002 gpr1=0188C40000000000 gpr2=00000000018BBD00 gpr3=00000000FFF3E2A8
gpr4=000003FF2D3AE0D2 gpr5=000003FF2D3AE034 gpr6=0000000000000010 gpr7=000003FF2D3AE034
gpr8=000003FF2D3AE0D2 gpr9=00000000018BBD00 gpr10=00000000FFF3E2A8 gpr11=0000000000000003
gpr12=000003FF675067C8 gpr13=000003FF830FDA90 gpr14=000003FF8285D990 gpr15=000003FF830FD4D0
psw=000003FF83DCBD42 mask=0705200180000000 fpc=00080000 bea=000003FF8285D98E
fpr0=3fd181f669f26cb2 (f: 1777495168.000000, d: 2.735573e-01)
fpr1=481ad30000000000 (f: 0.000000, d: 2.281952e+39)
fpr2=000003ff081cab58 (f: 136096608.000000, d: 2.170869e-311)
fpr3=bfe02b867365978f (f: 1936037760.000000, d: -5.053131e-01)
fpr4=481ad30000000000 (f: 0.000000, d: 2.281952e+39)
fpr5=3f304091223d9990 (f: 574462336.000000, d: 2.479891e-04)
fpr6=000003fef8000090 (f: 4160749824.000000, d: 2.170735e-311)
fpr7=4029b8d788a3b173 (f: 2292429056.000000, d: 1.286102e+01)
fpr8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr9=000003ffa0ce6d90 (f: 2697883136.000000, d: 2.172135e-311)
fpr10=000003ff081ca6e8 (f: 136095456.000000, d: 2.170869e-311)
fpr11=000003ff9b57e318 (f: 2606228224.000000, d: 2.172089e-311)
fpr12=0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr13=000003ff0402afa8 (f: 67284904.000000, d: 2.170835e-311)
fpr14=000003ff830ff8c0 (f: 2198862080.000000, d: 2.171888e-311)
fpr15=000003ff04054e08 (f: 67456520.000000, d: 2.170835e-311)

and disassembly

  0x000003ff83dcbd10 <+0>:	stmg	%r6,%r15,48(%r15)
   0x000003ff83dcbd16 <+6>:	ltg	%r1,160(%r2)
   0x000003ff83dcbd1c <+12>:	lay	%r15,-200(%r15)
   0x000003ff83dcbd22 <+18>:	lgr	%r9,%r2
   0x000003ff83dcbd26 <+22>:	lgr	%r10,%r3
   0x000003ff83dcbd2a <+26>:	je	0x3ff83dcbe7e <detachMonitorInfo(J9VMThread*, j9object_t)+366>
   0x000003ff83dcbd2e <+30>:	llgf	%r1,0(%r3)
   0x000003ff83dcbd34 <+36>:	nill	%r1,65280
   0x000003ff83dcbd38 <+40>:	ltg	%r1,216(%r1)
   0x000003ff83dcbd3e <+46>:	jl	0x3ff83dcbe18 <detachMonitorInfo(J9VMThread*, j9object_t)+264>
   0x000003ff83dcbd42 <+50>:	llgf	%r11,0(%r1,%r3)

vmthread is 00000000018BBD00 and object is 00000000FFF3E2A8.

> !j9object 0x00000000FFF3E2A8
Unable to read object clazz at 0x00000000FFF3E2A8 (clazz = 0x00000000FFF3A500)

Object pointer is corrupt, might be a forwarded pointer. So jitGetOwnedObjectMonitors is returning a corrupt object slot.

tajila avatar Jun 09 '25 22:06 tajila

@Spencer-Comin, in an off-line discussion @tajila mentioned that it's possible that this is another instance of the problem reported in #21829, in case that helps to shed any more light on either problem.

hzongaro avatar Jun 11 '25 19:06 hzongaro

@r30shah, may I ask you to have someone on your team take a look at this one?

hzongaro avatar Jun 13 '25 18:06 hzongaro

@VermaSh assigning this one to you.

r30shah avatar Jun 13 '25 18:06 r30shah

Not sure why I can not assign this one to @VermaSh may be he needs to comment before it can get assigned.

r30shah avatar Jun 13 '25 18:06 r30shah

Looking into the issue

VermaSh avatar Jun 16 '25 13:06 VermaSh

https://github.com/eclipse-openj9/openj9/issues/22082 looks like the same issue. Same test and almost the same stack trace. Note there were some assertions added into detachMonitorInfo() about 10 days ago, so https://github.com/eclipse-openj9/openj9/issues/22082 might just be a different symptom of this issue.

hangshao0 avatar Jun 16 '25 18:06 hangshao0

disabled test - java/lang/Thread/virtual/MiscMonitorTests.java

17:35:19  openjdk version "24.0.1-beta" 2025-04-15
17:35:19  IBM Semeru Runtime Open Edition 24.0.1+9-202506170132 (build 24.0.1-beta+9-202506170132)
17:35:19  Eclipse OpenJ9 VM 24.0.1+9-202506170132 (build master-cf3179c911, JRE 24 Linux s390x-64-Bit Compressed References 20250617_84 (JIT enabled, AOT enabled)
17:35:19  OpenJ9   - cf3179c911
17:35:19  OMR      - 6873ddb55
17:35:19  JCL      - 9e7e1c82d based on jdk-24.0.1+9)

17:46:28  Running test jdk_lang_0 ...

17:46:28  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
17:46:28  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 

18:33:32  TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp

18:33:32  STARTED    MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
18:33:32  Unhandled exception
18:33:32  Type=Segmentation error vmState=0x00000000
18:33:32  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
18:33:32  Handler1=000003FF970C7110 Handler2=000003FF96FB0978 InaccessibleAddress=E000B25000000000
18:33:32  gpr0=000003FFFFFFFFFE gpr1=000003FF0C00F578 gpr2=0000000000000000 gpr3=E000B25000000038
18:33:32  gpr4=E000B25000000000 gpr5=000003FF00000000 gpr6=000000000037D040 gpr7=000003FF97352550
18:33:32  gpr8=000003FF900875F0 gpr9=000003FF90028700 gpr10=00000000FFE18008 gpr11=00000000007ADA00
18:33:32  gpr12=000003FF76A7D5C8 gpr13=000003FF900844E8 gpr14=000003FF970FA446 gpr15=000003FF7617C228
18:33:32  psw=000003FF970FA46E mask=0705100180000000 fpc=00080000 bea=000003FF96F0A8BA
18:33:32  fpr0=3fef421f44b15f74 (f: 1152475008.000000, d: 9.768216e-01)
18:33:32  fpr1=0000000000000007 (f: 7.000000, d: 3.458460e-323)
18:33:32  fpr2=000003ff040000d0 (f: 67109072.000000, d: 2.170835e-311)
18:33:32  fpr3=000003ff040010f8 (f: 67113208.000000, d: 2.170835e-311)
18:33:32  fpr4=000003ff04000ae6 (f: 67111656.000000, d: 2.170835e-311)
18:33:32  fpr5=8629686848f68bc9 (f: 1224117248.000000, d: -5.598888e-279)
18:33:32  fpr6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:33:32  fpr7=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:33:32  fpr8=000000000037d040 (f: 3657792.000000, d: 1.807189e-317)
18:33:32  fpr9=0000000000041000 (f: 266240.000000, d: 1.315400e-318)
18:33:32  fpr10=000003ff1c19d0c8 (f: 471453888.000000, d: 2.171035e-311)
18:33:32  fpr11=000003ffb807d2e8 (f: 3087520512.000000, d: 2.172327e-311)
18:33:32  fpr12=000003ff7617e900 (f: 1981278464.000000, d: 2.171781e-311)
18:33:32  fpr13=000003ff20056c48 (f: 537226304.000000, d: 2.171067e-311)
18:33:32  fpr14=000003ff7613e000 (f: 1981014016.000000, d: 2.171780e-311)
18:33:32  fpr15=000003ff200472f8 (f: 537162496.000000, d: 2.171067e-311)
18:33:32  Module=/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
18:33:32  Module_base_address=000003FF97080000
18:33:32  Target=2_90_20250617_84 (Linux 5.15.0-140-generic)
18:33:32  CPU=s390x (4 logical CPUs) (0x1f5c03000 RAM)
18:33:32  ----------- Stack Backtrace -----------
18:33:32  monitorTableAt+0x3d6 (0x000003FF970FA46E [libj9vm29.so+0x7a46e])
18:33:32  objectMonitorInflate+0x1e (0x000003FF970F8966 [libj9vm29.so+0x78966])
18:33:32  detachMonitorInfo+0x178 (0x000003FF9714C360 [libj9vm29.so+0xcc360])
18:33:32  walkLiveMonitorSlotsForYield+0x13e (0x000003FF95EDCCF6 [libj9jit29.so+0xc5ccf6])
18:33:32  jitGetOwnedObjectMonitors+0x254 (0x000003FF95EDE92C [libj9jit29.so+0xc5e92c])
18:33:32  walkFrame+0x222 (0x000003FF97115DCA [libj9vm29.so+0x95dca])
18:33:32  jitWalkStackFrames+0x198 (0x000003FF95EDD0A8 [libj9jit29.so+0xc5d0a8])
18:33:32  walkStackFrames+0xf8 (0x000003FF97116700 [libj9vm29.so+0x96700])
18:33:32  preparePinnedVirtualThreadForUnmount+0xe8 (0x000003FF9714D6A8 [libj9vm29.so+0xcd6a8])
18:33:32  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x28d86 (0x000003FF97177526 [libj9vm29.so+0xf7526])
18:33:32  bytecodeLoopCompressed+0xee (0x000003FF9714E6DE [libj9vm29.so+0xce6de])
18:33:32  c_cInterpreter+0x64 (0x000003FF97266BDC [libj9vm29.so+0x1e6bdc])
18:33:32  ---------------------------------------

18:33:32  TEST RESULT: Failed. Unexpected exit from test [exit code: 255]

JasonFengJ9 avatar Jun 18 '25 13:06 JasonFengJ9

I can not find core-dump from any of the failure available to look at. Launched grinders to reproduce the failure and get the core-dump to inspect. https://openj9-jenkins.osuosl.org/job/Grinder/4422/console

r30shah avatar Jun 25 '25 16:06 r30shah

Sorry, about the delayed update. I am able to reproduce the failure locally. Launched a grinder with non volatile GPR changes, (build). I am seeing 7 less failures with non volatile GPR changes

Original set of jdk_lan_0 failures:

jdk_lang_0 - Test results: passed: 948; failed: 14; error: 5 
	Failed test cases: 
		TEST: java/lang/Thread/virtual/stress/GetStackTraceALotWhenBlocking.java#id0
       TEST: java/lang/Thread/virtual/stress/LotsOfContendedMonitorEnter.java#LM_LIGHTWEIGHT
       TEST: java/lang/Thread/virtual/stress/Skynet.java#default
       TEST: java/lang/Thread/virtual/stress/Skynet100kWithMonitors.java#id0
       TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp
       TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-noTieredCompilation
       TEST: java/lang/Thread/virtual/CancelTimerWithContention.java
       TEST: java/lang/Thread/virtual/MiscMonitorTests.java#default
       TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xint
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#default
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#Xcomp-LM_LIGHTWEIGHT
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#Xcomp-noTieredCompilation-LM_LEGACY
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#Xcomp-noTieredCompilation-LM_LIGHTWEIGHT
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#Xcomp-TieredStopAtLevel1-LM_LEGACY
       TEST: java/lang/Thread/virtual/MonitorEnterExit.java#Xcomp-TieredStopAtLevel1-LM_LIGHTWEIGHT
       TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-TieredStopAtLevel3
       TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#Xcomp-TieredStopAtLevel1-LM_LIGHTWEIGHT
       TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#LM_LEGACY
       TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#Xcomp-LM_LIGHTWEIGHT

Failures with non volatile GPR changes build:

Failed test cases: 
TEST: java/lang/Thread/virtual/stress/Skynet.java#default
TEST: java/lang/Thread/virtual/stress/LotsOfContendedMonitorEnter.java#default
TEST: java/lang/Thread/virtual/stress/LotsOfContendedMonitorEnter.java#LM_LIGHTWEIGHT
TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp
TEST: java/lang/Thread/virtual/stress/TimedWaitALot.java#timeout-notify
TEST: java/lang/Thread/virtual/stress/TimedWaitALot.java#timeout-notify-interrupt
TEST: java/lang/Thread/virtual/MiscMonitorTests.java#default
TEST: java/lang/Thread/virtual/MonitorEnterExit.java#default
TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#Xcomp-LM_LIGHTWEIGHT
TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#Xcomp-TieredStopAtLevel1-LM_LEGACY
TEST: java/lang/Thread/virtual/MonitorWaitNotify.java#Xcomp-TieredStopAtLevel1-LM_LIGHTWEIGHT
TEST: java/lang/Thread/virtual/ThreadAPI.java#no-vmcontinuations
Test results: passed: 958; failed: 2; error: 10

VermaSh avatar Jun 25 '25 17:06 VermaSh

@VermaSh There are multiple issues opened up regarding to various failures, are you able to reproduce the one in this issue ?>

r30shah avatar Jun 25 '25 17:06 r30shah

With build containing non volatile GPR changes, I am able to reproduce java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp but TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-TieredStopAtLevel3 doesn't fail anymore.

Originally reported failing build:

openjdk version "24.0.1-beta" 2025-04-15
IBM Semeru Runtime Open Edition 24.0.1+9-202506040003 (build 24.0.1-beta+9-202506040003)
Eclipse OpenJ9 VM 24.0.1+9-202506040003 (build master-ee99618777, JRE 24 Linux s390x-64-Bit Compressed References 20250604_75 (JIT enabled, AOT enabled)
OpenJ9   - ee99618777
OMR      - 556e0fe4b
JCL      - 801da8362 based on jdk-24.0.1+9)

java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp fails differently than originally reported:

Failure with original reported failing build:

STARTED    MiscMonitorTests::testReleaseOnYield 'testReleaseOnYield()'
Exiting bar from thread Batch2-0
Exiting bar from thread Batch2-3
Exiting bar from thread Batch2-1
Exiting bar from thread Batch2-2
Exiting foo from thread Batch1-3
Exiting foo from thread Batch1-1
Exiting foo from thread Batch1-2
Exiting foo from thread Batch1-0
SUCCESSFUL MiscMonitorTests::testReleaseOnYield 'testReleaseOnYield()'
STARTED    MiscMonitorTests::testContentionMultipleMonitors2 'testContentionMultipleMonitors2()'
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=000003FF814470F8 Handler2=000003FF81330978 InaccessibleAddress=0000000000000000
gpr0=0000000000000002 gpr1=0000000000000000 gpr2=0000000000EAA400 gpr3=00000000FFE51000
gpr4=000003FF59988612 gpr5=000003FF59988574 gpr6=0000000000000010 gpr7=000003FF59988574
gpr8=000003FF59988612 gpr9=0000000000EAA400 gpr10=00000000FFE51000 gpr11=0000000000000003
gpr12=000003FF60DEDAC8 gpr13=000003FF809FDAE0 gpr14=000003FF7BB5D990 gpr15=000003FF809FD520
psw=000003FF814CBD38 mask=0705000180000000 fpc=00080000 bea=000003FF7BB5D98E
fpr0=3fd7938aa48ab02c (f: 2760552448.000000, d: 3.683802e-01)
fpr1=48100a80809fda80 (f: 2157959680.000000, d: 1.364619e+39)
fpr2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr3=bfdf81448204135a (f: 2181305088.000000, d: -4.922649e-01)
fpr4=48100a8000000000 (f: 0.000000, d: 1.364619e+39)
fpr5=3f424652c56b0484 (f: 3312125184.000000, d: 5.576996e-04)
fpr6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr7=40d26dad60e60211 (f: 1625686528.000000, d: 1.887071e+04)
fpr8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr9=000003ff9e8ed5e0 (f: 2660161024.000000, d: 2.172116e-311)
fpr10=000003ff7c02c030 (f: 2080555008.000000, d: 2.171830e-311)
fpr11=000003ff9da7df10 (f: 2645024512.000000, d: 2.172109e-311)
fpr12=0000000000a57070 (f: 10842224.000000, d: 5.356770e-317)
fpr13=000003ff00060fb8 (f: 397240.000000, d: 2.170802e-311)
fpr14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr15=000003ff00064f98 (f: 413592.000000, d: 2.170802e-311)
Module=/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
Module_base_address=000003FF81400000
Target=2_90_20250604_75 (Linux 6.4.0-150600.23.47-default)
CPU=s390x (4 logical CPUs) (0x1e7a7e000 RAM)
----------- Stack Backtrace -----------
detachMonitorInfo+0x28 (0x000003FF814CBD38 [libj9vm29.so+0xcbd38])
walkLiveMonitorSlotsForYield+0x120 (0x000003FF7BB5D990 [libj9jit29.so+0xc5d990])
jitGetOwnedObjectMonitors+0x254 (0x000003FF7BB5F5B4 [libj9jit29.so+0xc5f5b4])
walkFrame+0x222 (0x000003FF81495EEA [libj9vm29.so+0x95eea])
jitWalkStackFrames+0x198 (0x000003FF7BB5DD30 [libj9jit29.so+0xc5dd30])
walkStackFrames+0xf8 (0x000003FF81496820 [libj9vm29.so+0x96820])
preparePinnedVirtualThreadForUnmount+0xe8 (0x000003FF814CCE08 [libj9vm29.so+0xcce08])
_ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x28d86 (0x000003FF814F6C86 [libj9vm29.so+0xf6c86])
bytecodeLoopCompressed+0xee (0x000003FF814CDE3E [libj9vm29.so+0xcde3e])
c_cInterpreter+0x64 (0x000003FF815E633C [libj9vm29.so+0x1e633c])
---------------------------------------

Failure with non volatile GPR changes:

Exiting foo from thread VThread-50
000003FF802FD440: Object neither in heap nor stack-allocated in thread J9VMContinuation@0x3fe9405bff0
000003FF802FD440:       O-Slot=000003FE9405C070
000003FF802FD440:       O-Slot value=0000000000C2EEC1
000003FF802FD440:       PC=000003FF604C2FF0
000003FF802FD440:       framesWalked=1
000003FF802FD440:       arg0EA=0000000000DF1228
000003FF802FD440:       walkSP=0000000000DF10E8
000003FF802FD440:       literals=0000000000000000
000003FF802FD440:       jitInfo=000003FF5D1A5F88
000003FF802FD440:       method=0000000000AF4C08 (MiscMonitorTests$TestContentionWithSyncMethods.foo()V) (JIT)
000003FF802FD440:       stack=0000000000DEBBC8-0000000000DF13F0
17:01:36.289 0x981000    j9mm.479    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK24_s390x_linux_Personal/openj9/runtime/gc_glue_java/ScavengerDelegate.cpp:362: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(env)))

Update: fixed originally reported stack trace

VermaSh avatar Jun 25 '25 17:06 VermaSh

From my latest grinder, I don't see the original reported failure. I am seeing failures related to object monitors:

[2025-06-25T19:54:01.764Z] Module=/home/jenkins/workspace/Grinder_iteration_3/jdkbinary/j2sdk-image/lib/default/libj9vm29.so
[2025-06-25T19:54:01.764Z] Module_base_address=000003FFB2B00000
[2025-06-25T19:54:01.764Z] Target=2_90_20250610_79 (Linux 6.4.0-150600.23.53-default)
[2025-06-25T19:54:01.764Z] CPU=s390x (4 logical CPUs) (0x1e7a7c000 RAM)
[2025-06-25T19:54:01.764Z] ----------- Stack Backtrace -----------
[2025-06-25T19:54:01.764Z] objectMonitorExit+0x4cc (0x000003FFB2B7908C [libj9vm29.so+0x7908c])
[2025-06-25T19:54:01.764Z] fast_jitMonitorExit+0x3e (0x000003FFB194A436 [libj9jit29.so+0xc4a436])
[2025-06-25T19:54:01.764Z] jitMonitorExit+0x20 (0x000003FFB195E170 [libj9jit29.so+0xc5e170])
[2025-06-25T19:54:01.764Z] ---------------------------------------
[2025-06-25T19:54:59.975Z] Compiled_method=MonitorEnterExit.lambda$testUnblocking$0(Ljava/util/concurrent/CountDownLatch;Ljava/lang/Object;Ljava/util/concurrent/atomic/AtomicBoolean;)V
[2025-06-25T19:54:59.975Z] Target=2_90_20250610_79 (Linux 6.4.0-150600.23.53-default)
[2025-06-25T19:54:59.975Z] CPU=s390x (4 logical CPUs) (0x1e7a7c000 RAM)
[2025-06-25T19:54:59.975Z] ----------- Stack Backtrace -----------
[2025-06-25T19:54:59.975Z]  (0x000003FE98CD6EA2 [<unknown>+0x0])
[2025-06-25T19:54:59.975Z] ---------------------------------------
[2025-06-25T19:05:44.262Z] ----------- Stack Backtrace -----------
[2025-06-25T19:05:44.262Z] getObjectMonitorOwner+0xde (0x000003FF94EF6E1E [libj9vm29.so+0x1f6e1e])
[2025-06-25T19:05:44.262Z] _ZN26VM_BytecodeInterpreterFull3runEP10J9VMThread+0x21362 (0x000003FF94E25BC2 [libj9vm29.so+0x125bc2])
[2025-06-25T19:05:44.262Z] bytecodeLoopFull+0xee (0x000003FF94E047AE [libj9vm29.so+0x1047ae])
[2025-06-25T19:05:44.262Z] c_cInterpreter+0x64 (0x000003FF94EE6704 [libj9vm29.so+0x1e6704])
[2025-06-25T19:05:44.262Z] ---------------------------------------

Are these already being investigated?

Also, the build (20250610_79) used by the grinder was slightly newer than what the failure was reported with. I have launched a grinder with older build (20250604_75) and another grinder with latest build (20250702_95).

VermaSh avatar Jul 02 '25 15:07 VermaSh

@VermaSh I remember you were able to get the original failure - Were you able to get hold on the data and inspect ?

r30shah avatar Jul 04 '25 19:07 r30shah

Sorry, will need a bit longer for an update on this. My fyre machine with the failure artifacts keeps disconnecting every few minutes, I am in the process of moving the artifacts over to our internal dev machine.

VermaSh avatar Jul 04 '25 21:07 VermaSh

Any new news on this?

vij-singh avatar Jul 08 '25 12:07 vij-singh

The test fails with similar symptoms as https://github.com/eclipse-openj9/openj9/issues/22089. The lockObject passed into detachMonitorInfo is corrupt/incorrect (src). The original failing job had three failures with detachMonitorInfo:

  1. TEST: java/lang/Thread/virtual/stress/LotsOfContendedMonitorEnter.java#LM_LIGHTWEIGHT
  2. TEST: java/lang/Thread/virtual/stress/GetStackTraceALotWhenBlocking.java#id0
  3. TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp

Of those, I was only able to reproduce the original failure locally with the second test.

(gdb) bt
#0  0x000003fffd3cbd2e in detachMonitorInfo (currentThread=0x370200, lockObject=0x8)
    at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/vm/ContinuationHelpers.cpp:766
#1  0x000003fff7c5d990 in walkLiveMonitorSlotsForYield (walkState=walkState@entry=0x3ffd7cf9a60, gcStackAtlas=gcStackAtlas@entry=0x3ffd63b24be, liveMonitorMap=liveMonitorMap@entry=0x3ffd63b254a " ",
    monitorMask=0x3ffd63b24de " ", numberOfMapBits=numberOfMapBits@entry=16)
    at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:1884
#2  0x000003fff7c5f5b4 in jitGetOwnedObjectMonitors (walkState=0x3ffd7cf9a60)
    at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:1745
#3  0x000003fffd395eea in walkFrame (walkState=0x3ffd7cf9a60) at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/vm/swalk.c:555
#4  0x000003fff7c5dd30 in jitWalkStackFrames (walkState=0x3ffd7cf9a60) at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:253
#5  0x000003fffd396820 in walkStackFrames (currentThread=0x370200, walkState=0x3ffd7cf9a60)
    at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/vm/swalk.c:388
#6  0x000003fffd3cce08 in preparePinnedVirtualThreadForUnmount (currentThread=0x370200, syncObj=0x0, isObjectWait=<optimized out>)
    at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/vm/ContinuationHelpers.cpp:1061
#7  0x000003fffd3f6c86 in VM_BytecodeInterpreterCompressed::yieldContinuationImpl (_pc=<optimized out>, _sp=<optimized out>, this=<optimized out>)

It crashes trying to dereference an invalid pointer, consistent with what was seen in the core dumps from the originally reported tests.

(gdb) x/20i $pc-0x20
   0x3fffd3cbd0e:	nopr	%r7
   0x3fffd3cbd10 <detachMonitorInfo(J9VMThread*, j9object_t)>:	stmg	%r6,%r15,48(%r15)
   0x3fffd3cbd16 <detachMonitorInfo(J9VMThread*, j9object_t)+6>:	ltg	%r1,160(%r2)
   0x3fffd3cbd1c <detachMonitorInfo(J9VMThread*, j9object_t)+12>:	lay	%r15,-200(%r15)
   0x3fffd3cbd22 <detachMonitorInfo(J9VMThread*, j9object_t)+18>:	lgr	%r9,%r2
   0x3fffd3cbd26 <detachMonitorInfo(J9VMThread*, j9object_t)+22>:	lgr	%r10,%r3
   0x3fffd3cbd2a <detachMonitorInfo(J9VMThread*, j9object_t)+26>:	je	0x3fffd3cbe7e <detachMonitorInfo(J9VMThread*, j9object_t)+366>
=> 0x3fffd3cbd2e <detachMonitorInfo(J9VMThread*, j9object_t)+30>:	llgf	%r1,0(%r3)
   0x3fffd3cbd34 <detachMonitorInfo(J9VMThread*, j9object_t)+36>:	nill	%r1,65280
...
(gdb) info reg r1 r3
r1             0x1                 1
r3             0x8                 8

VermaSh avatar Jul 08 '25 13:07 VermaSh

@VermaSh Can you point me to the core-dump with the above stack trace ? The original failure that @ehsankianifar looked into in https://github.com/eclipse-openj9/openj9/issues/22089 have specific symptom where the JIT code either uses R8/R9 to hold the object before it calls out helper that can unmount the thread. If it is same issue we should be able to confirm.

As Ehsan has said , in https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3049022391 original issue has been resolved with the Z changes, but the test still fails with the issue that seems to be related to [1] or [2]

[1]. https://github.com/eclipse-openj9/openj9/issues/21829#issuecomment-2873012708 [2]. https://github.com/eclipse-openj9/openj9/issues/22074

r30shah avatar Jul 08 '25 14:07 r30shah

For sure, core dump for above stack trace is in /home/sverma/work_items/jdk24_failures/src/core.317164. If you need, the original core dumps are in /home/sverma/work_items/jdk24_failures/src/GetStackTraceALotWhenBlocking_id0, /home/sverma/work_items/jdk24_failures/src/MiscMonitorTests_Xcomp/detachMonitorInfo_failure and /home/sverma/work_items/jdk24_failures/src/LotsOfContendedMonitorEnter_LM_LIGHTWEIGHT.

As Ehsan has said , in https://github.com/eclipse-openj9/openj9/issues/22089#issuecomment-3049022391 original issue has been resolved with the Z changes, but the test still fails with the issue that seems to be related to [1] or [2]

I saw the same with these tests. I think my latest grinder might have that fix, let me check.

VermaSh avatar Jul 08 '25 14:07 VermaSh

My grinder doesn't have the fix, but here are my findings from when I ran with that fix couple of weeks back. Is that the fix you were referring to?

VermaSh avatar Jul 08 '25 14:07 VermaSh

I looked at the core-dump from Shubham (/home/sverma/work_items/jdk24_failures/src/core.317164) which has following backtrace,

Thread id: 317200 (0x4D710)
#0 bp:0x000003FFD7CF9608 ip:0x000003FFFD3CBD2E /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::detachMonitorInfo+0x1e
#1 bp:0x000003FFD7CF96E0 ip:0x000003FFF7C5D990 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9jit29.so::walkLiveMonitorSlotsForYield+0x120
#2 bp:0x000003FFD7CF97A8 ip:0x000003FFF7C5F5B4 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9jit29.so::walkLiveMonitorSlotsForYield+0x1d44
#3 bp:0x000003FFD7CF9858 ip:0x000003FFFD395EEA /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::walkFrame+0x222
#4 bp:0x000003FFD7CF9918 ip:0x000003FFF7C5DD30 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9jit29.so::walkLiveMonitorSlotsForYield+0x4c0
#5 bp:0x000003FFD7CF9A28 ip:0x000003FFFD396820 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::walkStackFrames+0xf8
#6 bp:0x000003FFD7CF9D38 ip:0x000003FFFD3CCE08 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::preparePinnedVirtualThreadForUnmount+0xe8
#7 bp:0x000003FFD7CFA310 ip:0x000003FFFD3F6C86 /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::_ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x28d86
#8 bp:0x000003FFD7CFA450 ip:0x000003FFFD3CDE3E /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::bytecodeLoopCompressed+0xee
#9 bp:0x000003FFFD3CBD2E ip:0x000003FFFD4E633C /home/sverma/work_items/jdk24_failures/builds/failing_build/lib/default/libj9vm29.so::c_cInterpreter+0x64

Looking into the GC scan for stack slots, it does complain about corrupt slot ( 370200(35d698) -> 8). Looking at the JIT stackslots, to see from where is is coming from, it is temp slot

<370200> 		O-Slot: : t5[0x000000000035D698] = 0x0000000000000008

Checking the JIT compiled code from where it originates, I do see that the we store into that slot right after returning from jitMonitorEntry call.

0x3ffdd093a6c +112               e3609000003a llzrgf    %r6, 0(%r9) <= R9 contains the AtomicBoolean object using which we enter into synchronized region 
0x3ffdd093a72 +118               e3a060d80002 ltg       %r10, 0xd8(%r6)
0x3ffdd093a78 +124               a7c40bf2     jle       0x3ffdd09525c C>> +6240
0x3ffdd093a7c +128               41aa9000     la        %r10, 0(%r10, %r9)
0x3ffdd093a80 +132               1777         xr        %r7, %r7
0x3ffdd093a82 +134               ba7da000     cs        %r7, %r13, 0(%r10)
0x3ffdd093a86 +138               a7440be1     jl        0x3ffdd095248 C>> +6220 => Inlined Compare and Swap fails, goes to OOL for jitMonitorEntry helper call
0x3ffdd093a8a +142               eb01dba8007a agsi      0xba8(%r13), 1
0x3ffdd093a90 +148               e39050880024 stg       %r9, 0x88(%r5)  <<< ^+6236 ^+6284 ^+6304 <= Return back to mainling, storing R9 to temp slot 

%r5 + 0x88 = 0x000000000035D698 - Slot GCCheck complained about

So the original failure reported in this issue (With detachMonitorInfo is caused by the same issue which is resolved by changes in https://github.com/eclipse-openj9/openj9/pull/22128

With respect to failures Shubham saw in his grinder (https://github.com/eclipse-openj9/openj9/issues/22052#issuecomment-3028243357) , it may be the common issue already under investigation [1][2]. I think this issue can be closed.

[1]. https://github.com/eclipse-openj9/openj9/issues/21829#issuecomment-2873012708 [2]. https://github.com/eclipse-openj9/openj9/issues/22074

r30shah avatar Jul 08 '25 17:07 r30shah

For the record, disabled test java/lang/Thread/virtual/MiscMonitorTests.java failed with following assertion failure.

21:17:12  openjdk version "24.0.1-beta" 2025-04-15
21:17:12  IBM Semeru Runtime Open Edition 24.0.1+9-202507060717 (build 24.0.1-beta+9-202507060717)
21:17:12  Eclipse OpenJ9 VM 24.0.1+9-202507060717 (build master-47c1bc0ff2, JRE 24 Linux s390x-64-Bit Compressed References 20250706_98 (JIT enabled, AOT enabled)
21:17:12  OpenJ9   - 47c1bc0ff2
21:17:12  OMR      - 2cadecaf5
21:17:12  JCL      - 052d7bb94 based on jdk-24.0.1+9)

21:20:10  Running test jdk_lang_0 ...

21:20:10  variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
21:20:10  JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog 

21:37:06  TEST: java/lang/Thread/virtual/MiscMonitorTests.java#Xcomp-TieredStopAtLevel3

21:37:06  01:36:08.572 0x21d0d00    j9mm.479    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk24/jdk24-linux-s390x-openj9/workspace/build/src/openj9/runtime/gc_glue_java/ScavengerDelegate.cpp:362: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(env)))

Closing as per https://github.com/eclipse-openj9/openj9/issues/22052#issuecomment-3049810149

JasonFengJ9 avatar Jul 08 '25 18:07 JasonFengJ9