openj9 Non-fatal assert triggered in jdk_lang_0: 64-bit displacement should have been replaced

I observed this when running sanity.openjdk locally with a custom debug build. This was with a manually-started JITServer on the side, with the client requesting the JITServer AOT cache be used. However, if I look at the core, the compilation in question appears to be both local and not AOT. I haven't (but will) try to reproduce this without JITServer enabled.

Console log:

===============================================
Running test jdk_lang_0 ...
===============================================
jdk_lang_0 Start Time: Mon Aug 26 16:38:31 2024 Epoch Time (ms): 1724704711850
variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog -XX:+UseJITServer -XX:JITServerPort=23789 -XX:+JITServerUseAOTCache -XX:+JITServerAOTCacheIgnoreLocalSCC -Xjit:verbose={JITServer},vlog=clientvlog.txt,suffixLogs,aotCacheDisableGeneratedClassSupport

Assertion failed at /home/despresc/dev/testing/openj9-openjdk-jdk21/omr/compiler/x/codegen/OMRMemoryReference.cpp:1036: IS_32BIT_SIGNED(displacement)
VMState: 0x0005ff09
	64-bit displacement should have been replaced in TR_AMD64MemoryReference::generateBinaryEncoding
compiling java/text/CollationElementIterator.next()I at level: hot

Unhandled exception
Type=Unhandled trap vmState=0x0005ff09
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007F25C7BA30C0 Handler2=00007F25C7911B70
RDI=0000000000000002 RSI=00007F25AC86D8E0 RAX=0000000000000000 RBX=0000000000000005
RCX=00007F25CD7DEBBF RDX=0000000000000000 R8=0000000000000000 R9=00007F25AC86D8E0
R10=0000000000000008 R11=0000000000000246 R12=000000000000040C R13=00007F25B970E11A
R14=00007F25B970E388 R15=00007F25274B7340
RIP=00007F25CD7DEBBF GS=0000 FS=0000 RSP=00007F25AC86D8E0
EFlags=0000000000000246 CS=0033 RBP=00007F25B970E1B0 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=42656c62616e655f (f: 1634624896.000000, d: 7.361016e+11)
xmm1=00000000000000ff (f: 255.000000, d: 1.259867e-321)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4=0000000000ff0000 (f: 16711680.000000, d: 8.256667e-317)
xmm5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7=00007f25ac872fb0 (f: 2894540800.000000, d: 6.907027e-310)
xmm8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9=6c6c6c6f6c6c6c6c (f: 1819044992.000000, d: 1.913752e+214)
xmm10=6c6c1349182c6c1c (f: 405564448.000000, d: 1.890305e+214)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/libpthread.so.0
Module_base_address=00007F25CD7CC000 Symbol=raise
Symbol_address=00007F25CD7DEAB0

Method_being_compiled=java/text/CollationElementIterator.next()I
Target=2_90_20240802_000000 (Linux 4.18.0-553.8.1.el8_10.x86_64)
CPU=amd64 (8 logical CPUs) (0x7c7919000 RAM)
----------- Stack Backtrace -----------
raise+0x10f (0x00007F25CD7DEBBF [libpthread.so.0+0x12bbf])
_ZN2TR4trapEv+0x47 (0x00007F25B924DD6D [libj9jit29.so+0x5a2d6d])
_ZN2TR15fatal_assertionEPKciS1_S1_z+0x0 (0x00007F25B924DF9C [libj9jit29.so+0x5a2f9c])
_ZN2TR9assertionEPKciS1_S1_z+0xcc (0x00007F25B924E1AB [libj9jit29.so+0x5a31ab])
_ZN3OMR3X8615MemoryReference20estimateBinaryLengthEPN2TR13CodeGeneratorE+0x32f (0x00007F25B94D5D41 [libj9jit29.so+0x82ad41])
_ZN3OMR3X865AMD6415MemoryReference20estimateBinaryLengthEPN2TR13CodeGeneratorE+0x9 (0x00007F25B9530661 [libj9jit29.so+0x885661])
_ZN2TR20X86RegMemInstruction20estimateBinaryLengthEi+0x6e (0x00007F25B94F4FA4 [libj9jit29.so+0x849fa4])
_ZN3OMR3X8613CodeGenerator16doBinaryEncodingEv+0x3ac (0x00007F25B95298DA [libj9jit29.so+0x87e8da])
_ZN3OMR12CodeGenPhase26performBinaryEncodingPhaseEPN2TR13CodeGeneratorEPNS1_12CodeGenPhaseE+0x97 (0x00007F25B91D3ED9 [libj9jit29.so+0x528ed9])
_ZN3OMR12CodeGenPhase10performAllEv+0xb0 (0x00007F25B91D4972 [libj9jit29.so+0x529972])
_ZN3OMR13CodeGenerator12generateCodeEv+0x8a (0x00007F25B91D1278 [libj9jit29.so+0x526278])
_ZN3OMR11Compilation7compileEv+0xa63 (0x00007F25B91F0E67 [libj9jit29.so+0x545e67])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0xa4e (0x00007F25B8DF276C [libj9jit29.so+0x14776c])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0xa29 (0x00007F25B8DF38BF [libj9jit29.so+0x1488bf])
omrsig_protect+0x2a7 (0x00007F25C7912957 [libj9prt29.so+0x28957])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x5be (0x00007F25B8DF0C5E [libj9jit29.so+0x145c5e])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1b4 (0x00007F25B8DF119C [libj9jit29.so+0x14619c])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x15a (0x00007F25B8DEF88E [libj9jit29.so+0x14488e])
_ZN2TR24CompilationInfoPerThread3runEv+0x31 (0x00007F25B8DEFFEF [libj9jit29.so+0x144fef])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x93 (0x00007F25B8DF00EA [libj9jit29.so+0x1450ea])
omrsig_protect+0x2a7 (0x00007F25C7912957 [libj9prt29.so+0x28957])
_Z21compilationThreadProcPv+0x1bc (0x00007F25B8DF04E7 [libj9jit29.so+0x1454e7])
thread_wrapper+0x162 (0x00007F25C76DDF12 [libj9thr29.so+0x9f12])
start_thread+0xea (0x00007F25CD7D41CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F25CD22B8D3 [libc.so.6+0x398d3])

There is another such assert in x/codegen/OMRMemoryReference.cpp that was changed to be fatal in https://github.com/eclipse/omr/pull/6937, but a few others were left non-fatal. I'm not sure if that was an oversight, or if it's somehow not as important for this displacement to have been handled properly in generateBinaryEncoding.

Aug 27 '24 14:08 cjjdespres

Actually, if I look at the jit dump, the problem seems to be in the interaction of a few different optimizations, and does not appear to be JITServer-specific. First, there's this bit of code that starts out like:

n23277n   istore  <temp slot 5>[#3423  Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg )  [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=0 vn=- li=- udi=- nc=1 flg=0x2000
n23080n     isub                                                                              [0x7f24d9151e00] bci=[25,7,222] rc=1 vc=362 vn=- li=- udi=- nc=2
n23078n       iload  value<auto slot 4>[#456  Auto] [flags 0x3 0x0 ]                          [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=362 vn=- li=- udi=- nc=0
n23079n       iconst 0x7e000000 (X!=0 X>=0 )                                                  [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=362 vn=- li=- udi=- nc=0 flg=0x104

and eventually gets optimized to:

[  2621] O^O TREE SIMPLIFICATION: Normalized isub of iconst > 0 in node [0x7f24d9151e00] to iadd of -iconst

n23277n   istore  <temp slot 5>[#3423  Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg )  [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=3652 vn=- li=- udi=- nc=1 flg=0x2000
n23080n     iadd                                                                              [0x7f24d9151e00] bci=[25,7,222] rc=1 vc=0 vn=- li=- udi=- nc=2
n23078n       iload  value<auto slot 4>[#456  Auto] [flags 0x3 0x0 ]                          [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=3652 vn=- li=-1 udi=- nc=0
n23079n       iconst 0x82000000 (X!=0 X<=0 )                                                  [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=3652 vn=- li=-1 udi=- nc=0 flg=0x204

That value in iconst does not fit into an int32_t. (I think at least one comment in omr anticipates this situation and says it's fine). Finally, this gets decorated with cannotOverflow:

n23277n   istore  <temp slot 5>[#3423  Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg )  [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=0 vn=- li=- udi=180 nc=1 flg=0x2000
n23080n     iadd (X>=0 cannotOverflow )                                                       [0x7f24d9151e00] bci=[25,7,222] rc=4 vc=0 vn=- li=- udi=- nc=2 flg=0x1100
n23078n       iload  value<auto slot 4>[#456  Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow )   [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=0 vn=- li=- udi=977 nc=0 flg=0x1100
n23079n       iconst 0x82000000 (X!=0 X<=0 )                                                  [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x204

The actual bit of IL that causes the assert starts out like this:

n23246n   compressedRefs                                                                      [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=385 vn=- li=- udi=- nc=2
n23244n     aloadi  <array-shadow>[#233  Shadow] [flags 0x80000607 0x0 ]                      [0x7f24d9155140] bci=[27,5,731] rc=2 vc=385 vn=- li=- udi=- nc=1
n23243n       aladd (internalPtr )                                                            [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2 flg=0x8000
n23231n         ==>aloadi
n23242n         ladd                                                                          [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2
n23240n           lshl                                                                        [0x7f24d9155000] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2
n23239n             i2l (X>=0 )                                                               [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=1 flg=0x100
n23234n               ==>iload
n23238n             iconst 2 (X!=0 X>=0 )                                                     [0x7f24d9154f60] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x104
n23241n           lconst 8 (highWordZero X!=0 X>=0 )                                          [0x7f24d9155050] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x4104
n23245n     lconst 0 (highWordZero X==0 X>=0 X<=0 )                                           [0x7f24d9155190] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x4302

After an ladd->lsub transformation and some LCSE, this becomes

n23246n   compressedRefs                                                                      [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=268 vn=- li=- udi=- nc=2
n23244n     aloadi  <array-shadow>[#233  Shadow] [flags 0x80000607 0x0 ]                      [0x7f24d9155140] bci=[27,5,731] rc=2 vc=268 vn=- li=- udi=- nc=1
n23243n       aladd (X>=0 internalPtr )                                                       [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x8100
n23231n         ==>aloadi
n23242n         lsub (highWordZero X>=0 cannotOverflow )                                      [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x5100
n23240n           lmul (X>=0 cannotOverflow )                                                 [0x7f24d9155000] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x1100
n23239n             i2l (highWordZero X>=0 )                                                  [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=1 flg=0x4100
n23080n               ==>iadd
n23238n             lconst 4 (highWordZero X!=0 X>=0 )                                        [0x7f24d9154f60] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=0 flg=0x4104
n23241n           lconst -8 (X!=0 X<=0 )                                                      [0x7f24d9155050] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=0 flg=0x204
n260n       ==>lconst 0

and finally we get this optimization:

[ 13820] O^O TREE SIMPLIFICATION: Distributed lmul with lconst over isub or iadd of with iconst in node [00007F24D9155000]
[ 13821] O^O TREE SIMPLIFICATION: Found lsub of lconst with ladd or lsub of x and lconst in node [00007F24D91550A0]

n23246n   compressedRefs                                                                      [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=370 vn=- li=- udi=- nc=2
n23244n     aloadi  <array-shadow>[#233  Shadow] [flags 0x80000607 0x0 ]                      [0x7f24d9155140] bci=[27,5,731] rc=2 vc=370 vn=- li=- udi=- nc=1
n23243n       aladd (X>=0 internalPtr )                                                       [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=2 flg=0x8100
n23231n         ==>aloadi
n23242n         ladd (highWordZero X>=0 )                                                     [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=0 vn=- li=- udi=- nc=2 flg=0x4100
n45460n           lmul                                                                        [0x7f247f3a7170] bci=[27,5,731] rc=1 vc=0 vn=- li=- udi=- nc=2
n23239n             i2l                                                                       [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=1
n23078n               ==>iload
n45461n             lconst 4 (highWordZero X!=0 X>=0 )                                        [0x7f247f3a71c0] bci=[25,5,222] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x4104
n23241n           lconst 0xfffffffe08000008 (X!=0 X<=0 )                                      [0x7f24d9155050] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=0 flg=0x204
n260n       ==>lconst 0

The value 0xfffffffe08000008 is equal to (int64_t)(int32_t)0x82000000 * 4 + 8, which is what the combination of those two optimizations should produce, I think.

At instruction selection this turned into:

 n23246n  (  0)  compressedRefs                                                                       [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=- nc=2
 n23244n  (  2)    l2a (in &GPR_0x7f24776f67c0)                                                       [0x7f24d9155140] bci=[27,5,731] rc=2 vc=13 vn=- li=229 udi=26560 nc=1
 n53230n  (  0)      iu2l (in &GPR_0x7f24776f67c0)                                                    [0x7f2525cbee20] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=26560 nc=1
 n53229n  (  0)        iloadi  <array-shadow>[#233  Shadow] [flags 0x80000607 0x0 ] (in &GPR_0x7f24776f67c0)  [0x7f2525cbedd0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=26560 nc=1
 n23243n  (  0)          aladd (X>=0 internalPtr )                                                    [0x7f24d91550f0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=- nc=2 flg=0x8100
 n23231n  (  0)            ==>l2a (in &GPR_0x7f24776f5f40) (X!=0 )
 n23242n  (  0)            ladd (highWordZero X>=0 cannotOverflow )                                   [0x7f24d91550a0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=7 nc=2 flg=0x5100
 n45460n  (  0)              lshl                                                                     [0x7f247f3a7170] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=5 nc=2
 n23239n  (  0)                i2l (in GPR_0x7f24776eced0)                                            [0x7f24d9154fb0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=52944 nc=1
 n46696n  (  0)                  ==>iRegLoad (in GPR_0x7f24776eced0) (cannotOverflow SeenRealReference )
 n45461n  (  0)                iconst 2 (Unsigned X!=0 X>=0 )                                         [0x7f247f3a71c0] bci=[25,5,222] rc=0 vc=13 vn=- li=229 udi=1 nc=0 flg=0x4104
 n23241n  (  0)              lconst 0xfffffffe08000008 (X!=0 X<=0 )                                   [0x7f24d9155050] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=1 nc=0 flg=0x204
 n260n    (  0)    ==>lconst 0 (highWordZero X==0 X>=0 X<=0 )
------------------------------

 [0x7f24776f66b0]       movsxd  GPR_0x7f24776eced0, GPR_0x7f24776eced0          # MOVSXReg8Reg4
 [0x7f24776f6840]       mov     &GPR_0x7f24776f67c0, dword ptr [&GPR_0x7f24776f5f40+4*GPR_0x7f24776eced0-0x1f7fffff8]           # L4RegMem, SymRef  <array-shadow>[#233  Shadow +134217736] [flags 0x80000607 0x0 ]

and -0x1f7fffff8 == 0xfffffffe08000008, which was the value of the displacement at the time of the assert. The dump writer itself seems to have crashed with the same "64-bit displacement" assert while trying to print off the VFP Substitution later on in the log, just after printing the line for movxsd.

Aug 27 '24 17:08 cjjdespres

Attn @mpirvu, though I'm fairly sure at this point that the assert is not JITServer-specific.

I think my understanding of the situation is correct, but maybe @hzongaro could comment. I'm not sure what the correct behaviour here should be. Also, should the other "64-bit displacement" asserts in that same file be made fatal?

Aug 27 '24 17:08 cjjdespres

Though, I suppose that all of that optimization could be correct. The assert does mention TR_AMD64MemoryReference::generateBinaryEncoding, which no longer exists. I'm not sure where in OMR::X86::AMD64::MemoryReference::generateBinaryEncoding() this would have been handled.

Aug 27 '24 17:08 cjjdespres

@BradleyWood, may I ask you to have a look at this? It hits a fatal assertion you had introduced in OMR pull request eclipse/omr#6937, which fixed issue #15363.

Aug 27 '24 20:08 hzongaro

It's actually a non-fatal assert with the same message. I was running this test with a debug build.

Aug 27 '24 20:08 cjjdespres

It's actually a non-fatal assert with the same message. I was running this test with a debug build.

Ah! Thanks for the clarification.

Aug 27 '24 20:08 hzongaro

I will take a look. Looks like this is the assert firing. Probably not related to eclipse/omr#6937

Aug 27 '24 20:08 BradleyWood

No, not directly related - just related in the sense that there's another case that a large displacement has made it through to binary encoding.

Aug 27 '24 20:08 hzongaro

I don't think there is a functional issue here. The memory reference code could use some cleanup, but the binary length estimation code in OMR::AMD64::estimateBinaryLength() should account for an address load instruction.

Aug 29 '24 20:08 BradleyWood

Moving this out to the “Future” release for now, as it appears, from @BradleyWood’s analysis, that the TR_ASSERT itself needs cleaning up. It does not represent a functional problem.

Sep 16 '24 18:09 hzongaro