openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

MiniMix_3h_0 Segmentation error vmState=0x000504ff

Open luke-li-2003 opened this issue 5 months ago • 5 comments

Failure link

All failures so far are on pLinux, with and without JITServer:

Without JITServer: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/51350/console

With JITServer: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/51324/consoleText

19:57:54  openjdk version "11.0.28-internal" 2025-07-15
19:57:54  OpenJDK Runtime Environment (build 11.0.28-internal+0-adhoc..BuildJDK11ppc64lelinuxjitPersonal)
19:57:54  Eclipse OpenJ9 VM (build master-4c38ff38a58, JRE 11 Linux ppc64le-64-Bit Compressed References 20250605_2116 (JIT enabled, AOT enabled)
19:57:54  OpenJ9   - 4c38ff38a58
19:57:54  OMR      - 4a063e93a55
19:57:54  JCL      - 2638cf249b6 based on jdk-11.0.28+3)

Optional info

The test MiniMix_3h_0 is failing consistently withSegmentation error vmState=0x000504ff and the call stack below:

[2025-06-06T16:46:01.392Z] LT  stderr Unhandled exception
[2025-06-06T16:46:01.392Z] LT  stderr Type=Segmentation error vmState=0x000504ff
[2025-06-06T16:46:01.392Z] LT  stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
[2025-06-06T16:46:01.392Z] LT  stderr Handler1=00007FFF810A16D0 Handler2=00007FFF80FC8220
[2025-06-06T16:46:01.392Z] LT  stderr R0=00007FFF7A7C9AA4 R1=00007FFCE71B32F0 R2=00007FFF7AFDE700 R3=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr R4=00007FFC36641170 R5=0000000038E38E39 R6=FFFFFFFFAAAAAAAB R7=00007FFF7B00C718
[2025-06-06T16:46:01.392Z] LT  stderr R8=00007FFC36551B30 R9=000000000000017B R10=0000000000000003 R11=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr R12=00007FFF7A4618A0 R13=00007FFCE71C68E0 R14=0000000000000236 R15=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr R16=000000000000014C R17=0000000000000000 R18=0000000000000000 R19=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr R20=00007FFC36470000 R21=00007FFC36470000 R22=0000000000000018 R23=00007FFCE71B4628
[2025-06-06T16:46:01.392Z] LT  stderr R24=00007FFC3665C0A0 R25=00007FFCE71B4508 R26=0000000000000004 R27=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr R28=00007FFCE71B33E8 R29=00007FFC3665C0A0 R30=0000000000000001 R31=00007FFC3665C0A0
[2025-06-06T16:46:01.392Z] LT  stderr NIP=00007FFF7A7C9AA8 MSR=800000000280D033 ORIG_GPR3=00007FFF7A7753A4 CTR=00007FFF7A4618A0
[2025-06-06T16:46:01.392Z] LT  stderr LINK=00007FFF7A7C9AA4 XER=0000000000000000 CCR=0000000024882878 SOFTE=0000000000000001
[2025-06-06T16:46:01.392Z] LT  stderr TRAP=0000000000000300 DAR=0000000000000010 dsisr=0000000040000000 RESULT=0000000000000000
[2025-06-06T16:46:01.392Z] LT  stderr FPR0=00007ffce71b3408 (f: 3877319680.000000, d: 6.952699e-310)
[2025-06-06T16:46:01.392Z] LT  stderr FPR1=3f9505afe0000000 (f: 3758096384.000000, d: 2.052951e-02)
[2025-06-06T16:46:01.392Z] LT  stderr FPR2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR3=0000002e00000021 (f: 33.000000, d: 9.761181e-313)
[2025-06-06T16:46:01.392Z] LT  stderr FPR4=0000002a0000002b (f: 43.000000, d: 8.912382e-313)
[2025-06-06T16:46:01.392Z] LT  stderr FPR5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR7=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR9=0000000500000004 (f: 4.000000, d: 1.060998e-313)
[2025-06-06T16:46:01.392Z] LT  stderr FPR10=00000000000000a5 (f: 165.000000, d: 8.152083e-322)
[2025-06-06T16:46:01.392Z] LT  stderr FPR11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR16=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR17=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR18=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR19=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR20=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR21=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR22=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR23=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR24=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR25=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR26=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR27=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR28=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR29=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR30=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr FPR31=0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2025-06-06T16:46:01.392Z] LT  stderr Module=/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/lib/default/libj9jit29.so
[2025-06-06T16:46:01.392Z] LT  stderr Module_base_address=00007FFF7A000000
[2025-06-06T16:46:01.392Z] LT  stderr 
[2025-06-06T16:46:01.392Z] LT  stderr Method_being_compiled=java/nio/BufferMismatch.mismatch(Ljava/nio/DoubleBuffer;ILjava/nio/DoubleBuffer;II)I
[2025-06-06T16:46:01.392Z] LT  stderr Target=2_90_20250605_2116 (Linux 6.4.0-150600.23.50-default)
[2025-06-06T16:46:01.392Z] LT  stderr CPU=ppc64le (4 logical CPUs) (0x1dc1c0000 RAM)
[2025-06-06T16:46:01.392Z] LT  stderr ----------- Stack Backtrace -----------
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR13CFGSimplifier19simplifyIfStructureEv+0x138 (0x00007FFF7A7C9AA8 [libj9jit29.so+0x7c9aa8])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR13CFGSimplifier8simplifyEv+0xa0 (0x00007FFF7A7C9CD0 [libj9jit29.so+0x7c9cd0])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR13CFGSimplifier7performEv+0xdc (0x00007FFF7A7CC68C [libj9jit29.so+0x7cc68c])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x8ec (0x00007FFF7A9AE44C [libj9jit29.so+0x9ae44c])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x179c (0x00007FFF7A9AF2FC [libj9jit29.so+0x9af2fc])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR9Optimizer8optimizeEv+0x1e4 (0x00007FFF7A9B0474 [libj9jit29.so+0x9b0474])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR11Compilation20performOptimizationsEv+0x3c (0x00007FFF7A7109DC [libj9jit29.so+0x7109dc])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN3OMR11Compilation7compileEv+0x7a0 (0x00007FFF7A71B310 [libj9jit29.so+0x71b310])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x504 (0x00007FFF7A17A984 [libj9jit29.so+0x17a984])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x394 (0x00007FFF7A17BC24 [libj9jit29.so+0x17bc24])
[2025-06-06T16:46:02.818Z] LT  stderr omrsig_protect+0x3e4 (0x00007FFF80FC96D4 [libj9prt29.so+0x396d4])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x374 (0x00007FFF7A178E14 [libj9jit29.so+0x178e14])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x170 (0x00007FFF7A179410 [libj9jit29.so+0x179410])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x410 (0x00007FFF7A177C00 [libj9jit29.so+0x177c00])
[2025-06-06T16:46:02.818Z] LT  stderr _ZN2TR24CompilationInfoPerThread3runEv+0xa8 (0x00007FFF7A178218 [libj9jit29.so+0x178218])
[2025-06-06T16:46:02.818Z] LT  stderr _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0xa0 (0x00007FFF7A1782D0 [libj9jit29.so+0x1782d0])
[2025-06-06T16:46:02.818Z] LT  stderr omrsig_protect+0x3e4 (0x00007FFF80FC96D4 [libj9prt29.so+0x396d4])
[2025-06-06T16:46:02.818Z] LT  stderr _Z21compilationThreadProcPv+0x1a8 (0x00007FFF7A178838 [libj9jit29.so+0x178838])
[2025-06-06T16:46:02.818Z] LT  stderr thread_wrapper+0x190 (0x00007FFF80F5CC00 [libj9thr29.so+0xcc00])
[2025-06-06T16:46:02.818Z] LT  stderr start_thread+0x188 (0x00007FFF81A143CC [libc.so.6+0xd43cc])
[2025-06-06T16:46:02.818Z] LT  stderr ---------------------------------------

luke-li-2003 avatar Jun 07 '25 01:06 luke-li-2003

@hzongaro pls take a look.

vmState [0x504ff]: {J9VMSTATE_JIT} {CFGSimplification}

pshipton avatar Jun 09 '25 13:06 pshipton

Occurring across platforms. https://openj9-jenkins.osuosl.org/job/Pipeline-Special-System-JDK11/209/

There is also a MathLoadTest_autosimd_special_5m_18 failure on jdk11, jdk17, jdk21, not sure if it's related. https://github.com/eclipse-openj9/openj9/issues/22060

https://openj9-jenkins.osuosl.org/job/Pipeline-Special-System-JDK17/181/ https://openj9-jenkins.osuosl.org/job/Pipeline-Special-System-JDK21/101/ -XX:+UseCompressedOops -Xgcpolicy:balanced -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:count=0

09:29:47  MLT java.lang.RuntimeException: test failure
09:29:47  MLT 	at net.adoptopenjdk.test.autosimd.AutoSIMDTestInteger.checkThat(AutoSIMDTestInteger.java:583)
09:29:47  MLT 	at net.adoptopenjdk.test.autosimd.AutoSIMDTestInteger.checkThat(AutoSIMDTestInteger.java:578)
09:29:47  MLT 	at net.adoptopenjdk.test.autosimd.AutoSIMDTestInteger.testMatrixMult(AutoSIMDTestInteger.java:442)

pshipton avatar Jun 09 '25 14:06 pshipton

I found a failure for this internally in https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_x86-64_windows_testList_0/731/ where it previously passed in https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_x86-64_windows_testList_0/730/.

I believe the problem is somewhere in 735bf816fa..aae0599b81

hzongaro avatar Jun 09 '25 14:06 hzongaro

@luke-li-2003, I suspect this has something to do with pull request #21097. From jitdump.20250603.212100.4992.0004.dmp in https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_x86-64_windows_testList_0/731/, after Recognized Call Transformer:

TREE VERIFICATION ERROR -- node [0x00007FF15B290E60] ref count is 6 and should be 5
TREE VERIFICATION ERROR -- node [0x00007FF15B291860] ref count is 3 and should be 2
TREE VERIFICATION ERROR -- node [0x00007FF15B291A90] ref count is 3 and should be 2
TREE VERIFICATION ERROR -- node [0x00007FF15B291B80] ref count is 5 and should be 1
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF10FF0] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF11590] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF115E0] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF11630] accessed outside of its (extended) basic block: 2 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF11630] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF115E0] accessed outside of its (extended) basic block: 2 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF11360] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF11590] accessed outside of its (extended) basic block: 2 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF10DC0] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15AF10FF0] accessed outside of its (extended) basic block: 5 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15B290E60] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15B291860] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15B291A90] accessed outside of its (extended) basic block: 1 time(s)
BLOCK VERIFICATION ERROR -- node [0x00007FF15B291B80] accessed outside of its (extended) basic block: 4 time(s)

This all seems related to a call to jdk/internal/util/ArraysSupport.vectorizedMismatch. May I ask you to take a look?

hzongaro avatar Jun 09 '25 15:06 hzongaro

PR: https://github.com/eclipse-openj9/openj9/pull/22073

luke-li-2003 avatar Jun 10 '25 21:06 luke-li-2003

The test is still failing. https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_aarch64_linux_testList_0/799/ https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_ppc64le_linux_testList_0/972/ https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_x86-64_linux_testList_0/973/ https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_special.system_x86-64_mac_testList_0/939/

pshipton avatar Jul 03 '25 15:07 pshipton

The TREE VERIFICATION ERROR messages are now gone from the trace log in the newest fail above. More investigation needed.

luke-li-2003 avatar Jul 03 '25 15:07 luke-li-2003

@luke-li-2003 Let's take a look at the failure. Can we get a grinder on internal jenkins and get a log file?

r30shah avatar Jul 03 '25 15:07 r30shah

Any new news on this one?

vij-singh avatar Jul 08 '25 13:07 vij-singh

Not yet, things are slow since I can't reproduce it locally, but I have some ideas.

luke-li-2003 avatar Jul 08 '25 14:07 luke-li-2003

What I know so far:

The bug is caused by a strange behaviour in the inlining stage, where my nullcheck block (created by inlining vectorizedMismatch) contains an inlining candidate:

n263n     BBStart <block_21> (freq 5696)                                                      [0x7fe8299441f0] bci=[- 1,36,-] rc=0 vc=53 vn=- li=- udi=- nc=0
n241n     ifacmpeq --> block_18 BBStart at n232n ()                                           [0x7fe829943b10] bci=[- 1,36,-] rc=0 vc=53 vn=- li=- udi=- nc=2 flg=0x20
n237n       acalli  java/nio/IntBuffer.base()Ljava/lang/Object;[#435  virtual Method -296] (Abstract class) [flags    0x500 0x0 ]  [0x7fe8299439d0] bci=[-1,36,-] rc=1 vc=54 vn=- li=- udi=- nc=2
n238n         aloadi  <vft-symbol>[#342  Shadow] [flags 0x18607 0x0 ]                         [0x7fe829943a20] bci=[- 1,36,-] rc=1 vc=53 vn=- li=- udi=- nc=1
n239n           aload  a<parm 0 Ljava/nio/IntBuffer;>[#419  Parm] [flags 0x40000107 0x0 ]     [0x7fe829943a70] bci=[- 1,35,-] rc=1 vc=53 vn=- li=- udi=- nc=0
n240n         aload  a<parm 0 Ljava/nio/IntBuffer;>[#419  Parm] [flags 0x40000107 0x0 ]       [0x7fe829943ac0] bci=[- 1,35,-] rc=1 vc=53 vn=- li=- udi=- nc=0
n236n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7fe829943980] bci=[- 1,36,-] rc=1 vc=53 vn=- li=- udi=- nc=0 flg=0x302
n243n     BBEnd </block_21> =====                                                             [0x7fe829943bb0] bci=[- 1,70,-] rc=0 vc=53 vn=- li=- udi=- nc=0

After inlining it becomes:

n263n     BBStart <block_21> (freq 5696)                                                      [0x7fe8299441f0] bci=[- 1,36,-] rc=0 vc=53 vn=- li=- udi=- nc=0
n1258n    treetop                                                                             [0x7fe829aa78f0] bci=[- 1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n240n       aload  a<parm 0 Ljava/nio/IntBuffer;>[#419  Parm] [flags 0x40000107 0x0 ]         [0x7fe829943ac0] bci=[- 1,35,-] rc=2 vc=53 vn=- li=- udi=- nc=0
n1268n    ifacmpne --> block_102 BBStart at n1263n (ProfiledGuard/MethodTest )                [0x7fe829aa7c10]        bci=[13,0,294] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x1020
n1266n      aloadi  <vtable-entry-symbol>[#576  Shadow +672] [flags 0x10607 0x0 ]             [0x7fe829aa7b70] bci=[- 1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n1265n        aloadi  <vft-symbol>[#342  Shadow] [flags 0x18607 0x0 ]                         [0x7fe829aa7b20] bci=[- 1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n240n           ==>aload
n1267n      aconst 0x1a4710 (methodPointerConstant X!=0 X>=0 X<=0 )                           [0x7fe829aa7bc0]        bci=[13,0,294] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x2304
n1260n    BBEnd </block_21> =====                                                             [0x7fe829aa7990]        bci=[13,1,294] rc=0 vc=0 vn=- li=- udi=- nc=0

n1269n    BBStart <block_103> (freq 5696)                                                     [0x7fe829aa7c60] bci=[- 1,36,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n1273n    ificmpne --> block_102 BBStart at n1263n (HCRGuard/NonoverriddenTest )              [0x7fe829aa7da0]        bci=[13,0,294] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x1020
n1271n      iload  unknown static[#584  Static] [flags 0x10303 0x0 ]                          [0x7fe829aa7d00] bci=[- 1,36,-] rc=1 vc=0 vn=- li=- udi=- nc=0
n1272n      iconst 0 (X==0 X>=0 X<=0 )                                                        [0x7fe829aa7d50] bci=[- 1,36,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n1270n    BBEnd </block_103> =====                                                            [0x7fe829aa7cb0] bci=[- 1,36,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n1259n    BBStart <block_100> (freq 5696)                                                     [0x7fe829aa7940]        bci=[13,1,294] rc=0 vc=0 vn=- li=- udi=- nc=0
n1256n    compressedRefs                                                                      [0x7fe829aa7850]        bci=[13,1,294] rc=0 vc=0 vn=- li=- udi=- nc=2
n1254n      aloadi  java/nio/HeapIntBuffer.hb [I[#521  final Shadow +32] [flags 0x20607 0x0 ]  [0x7fe829aa77b0]       bci=[13,1,294] rc=1 vc=0 vn=- li=2 udi=- nc=1
n1253n        aload  a<parm 0 Ljava/nio/IntBuffer;>[#419  Parm] [flags 0x40000107 0x0 ] (X!=0 )  [0x7fe829aa7760]     bci=[13,0,294] rc=1 vc=0 vn=- li=1 udi=- nc=0 flg=0x4
n1255n      lconst 0 (highWordZero X==0 X>=0 X<=0 )                                           [0x7fe829aa7800]        bci=[13,1,294] rc=1 vc=0 vn=- li=1 udi=- nc=0 flg=0x4302
n1262n    BBEnd </block_100> =====                                                            [0x7fe829aa7a30] bci=[- 1,36,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n1261n    BBStart <block_101> (freq 5696)                                                     [0x7fe829aa79e0] bci=[- 1,36,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n243n     BBEnd </block_101> =====                                                            [0x7fe829943bb0] bci=[- 1,70,-] rc=0 vc=53 vn=- li=- udi=- nc=0

Here, block 100 is the inlined content, while block 103 and 21 above it are inlining guards. Walking back, block 21 probably got replaced by the inlined code, and two splits happened to insert the guards.

However, block 101 should contain the original code of block 21 (the null check using the return result of the inlined code), but it is instead empty.

However, the cfg still pretends block 101 contains an 'if' statement (i.e. the null check) and considers it to exit to both block 20 and 18, just like what the null check block did before the inlining. When the cfg simplification comes, it tries to find a treetop in an empty block that results in the segfault.

If I can figure out what's causing block 101 to be empty, I can probably figure out what's going on.

tlog.minimix.instrumented.txt

luke-li-2003 avatar Jul 08 '25 21:07 luke-li-2003

@nbhuiyan FYI... as an inlining expert, do you have any insight for Luke on the above trace?

vij-singh avatar Jul 09 '25 15:07 vij-singh

My current guess is that the inliner expects any function call to be anchored to the treetop, which is something I didn't know about. I made a version that does anchor the children and it's not failing so far.

luke-li-2003 avatar Jul 09 '25 19:07 luke-li-2003

From briefly looking at the JIT log, what was looking strange to me is block_101 somehow had 2 outgoing edges in the CFG, even though it's a fallthrough block. The CFG simplifier's checks fail because it thinks that the block contains a conditional branch due to the block having 2 successors. It is not strange to have such empty blocks, as they serve as the destination of some gotos, which is the case if we look further down the trees.

My current guess is that the inliner expects any function call to be anchored to the treetop, which is something I didn't know about. I made a version that does anchor the children and it's not failing so far.

Yes, the call node of java/nio/IntBuffer.base cannot directly be a child node of ifacmpeq. The call node should be anchored in its own treetop. I see other instances of call nodes not being anchored in other parts of the IL. If any of the results of these calls are used outside of the block containing the call node, then the return value would need to be stored into temps/autos to be loaded from where needed.

nbhuiyan avatar Jul 09 '25 21:07 nbhuiyan

If any of the results of these calls are used outside of the block containing the call node, then the return value would need to be stored into temps/autos to be loaded from where needed.

Just to make sure I understand: assuming the anchoring is done correctly, this should already be guaranteed, since the call node cannot be referenced outside the block, so any results of the call used outside the block must already have some store/load mechanism in place.

luke-li-2003 avatar Jul 09 '25 21:07 luke-li-2003

@luke-li-2003: yes, that's correct.

nbhuiyan avatar Jul 09 '25 21:07 nbhuiyan

This should work as a fix: https://github.com/eclipse-openj9/openj9/pull/22209

luke-li-2003 avatar Jul 10 '25 16:07 luke-li-2003

@luke-li-2003 Just to confirm, are we expecting #22209 to fix both #22059 and #22060 (and also https://github.com/eclipse-openj9/openj9/issues/22107) ?

vij-singh avatar Jul 23 '25 13:07 vij-singh

https://github.com/eclipse-openj9/openj9/issues/22060 is the one I'm not sure about, the others should be fixed.

luke-li-2003 avatar Jul 23 '25 14:07 luke-li-2003

https://github.com/eclipse-openj9/openj9/pull/22209 is merged.

pshipton avatar Aug 01 '25 17:08 pshipton