Non-fatal assert triggered in jdk_lang_0: 64-bit displacement should have been replaced
I observed this when running sanity.openjdk locally with a custom debug build. This was with a manually-started JITServer on the side, with the client requesting the JITServer AOT cache be used. However, if I look at the core, the compilation in question appears to be both local and not AOT. I haven't (but will) try to reproduce this without JITServer enabled.
Console log:
===============================================
Running test jdk_lang_0 ...
===============================================
jdk_lang_0 Start Time: Mon Aug 26 16:38:31 2024 Epoch Time (ms): 1724704711850
variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
JVM_OPTIONS: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops -Xverbosegclog -XX:+UseJITServer -XX:JITServerPort=23789 -XX:+JITServerUseAOTCache -XX:+JITServerAOTCacheIgnoreLocalSCC -Xjit:verbose={JITServer},vlog=clientvlog.txt,suffixLogs,aotCacheDisableGeneratedClassSupport
Assertion failed at /home/despresc/dev/testing/openj9-openjdk-jdk21/omr/compiler/x/codegen/OMRMemoryReference.cpp:1036: IS_32BIT_SIGNED(displacement)
VMState: 0x0005ff09
64-bit displacement should have been replaced in TR_AMD64MemoryReference::generateBinaryEncoding
compiling java/text/CollationElementIterator.next()I at level: hot
Unhandled exception
Type=Unhandled trap vmState=0x0005ff09
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007F25C7BA30C0 Handler2=00007F25C7911B70
RDI=0000000000000002 RSI=00007F25AC86D8E0 RAX=0000000000000000 RBX=0000000000000005
RCX=00007F25CD7DEBBF RDX=0000000000000000 R8=0000000000000000 R9=00007F25AC86D8E0
R10=0000000000000008 R11=0000000000000246 R12=000000000000040C R13=00007F25B970E11A
R14=00007F25B970E388 R15=00007F25274B7340
RIP=00007F25CD7DEBBF GS=0000 FS=0000 RSP=00007F25AC86D8E0
EFlags=0000000000000246 CS=0033 RBP=00007F25B970E1B0 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=42656c62616e655f (f: 1634624896.000000, d: 7.361016e+11)
xmm1=00000000000000ff (f: 255.000000, d: 1.259867e-321)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4=0000000000ff0000 (f: 16711680.000000, d: 8.256667e-317)
xmm5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7=00007f25ac872fb0 (f: 2894540800.000000, d: 6.907027e-310)
xmm8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9=6c6c6c6f6c6c6c6c (f: 1819044992.000000, d: 1.913752e+214)
xmm10=6c6c1349182c6c1c (f: 405564448.000000, d: 1.890305e+214)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/libpthread.so.0
Module_base_address=00007F25CD7CC000 Symbol=raise
Symbol_address=00007F25CD7DEAB0
Method_being_compiled=java/text/CollationElementIterator.next()I
Target=2_90_20240802_000000 (Linux 4.18.0-553.8.1.el8_10.x86_64)
CPU=amd64 (8 logical CPUs) (0x7c7919000 RAM)
----------- Stack Backtrace -----------
raise+0x10f (0x00007F25CD7DEBBF [libpthread.so.0+0x12bbf])
_ZN2TR4trapEv+0x47 (0x00007F25B924DD6D [libj9jit29.so+0x5a2d6d])
_ZN2TR15fatal_assertionEPKciS1_S1_z+0x0 (0x00007F25B924DF9C [libj9jit29.so+0x5a2f9c])
_ZN2TR9assertionEPKciS1_S1_z+0xcc (0x00007F25B924E1AB [libj9jit29.so+0x5a31ab])
_ZN3OMR3X8615MemoryReference20estimateBinaryLengthEPN2TR13CodeGeneratorE+0x32f (0x00007F25B94D5D41 [libj9jit29.so+0x82ad41])
_ZN3OMR3X865AMD6415MemoryReference20estimateBinaryLengthEPN2TR13CodeGeneratorE+0x9 (0x00007F25B9530661 [libj9jit29.so+0x885661])
_ZN2TR20X86RegMemInstruction20estimateBinaryLengthEi+0x6e (0x00007F25B94F4FA4 [libj9jit29.so+0x849fa4])
_ZN3OMR3X8613CodeGenerator16doBinaryEncodingEv+0x3ac (0x00007F25B95298DA [libj9jit29.so+0x87e8da])
_ZN3OMR12CodeGenPhase26performBinaryEncodingPhaseEPN2TR13CodeGeneratorEPNS1_12CodeGenPhaseE+0x97 (0x00007F25B91D3ED9 [libj9jit29.so+0x528ed9])
_ZN3OMR12CodeGenPhase10performAllEv+0xb0 (0x00007F25B91D4972 [libj9jit29.so+0x529972])
_ZN3OMR13CodeGenerator12generateCodeEv+0x8a (0x00007F25B91D1278 [libj9jit29.so+0x526278])
_ZN3OMR11Compilation7compileEv+0xa63 (0x00007F25B91F0E67 [libj9jit29.so+0x545e67])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0xa4e (0x00007F25B8DF276C [libj9jit29.so+0x14776c])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0xa29 (0x00007F25B8DF38BF [libj9jit29.so+0x1488bf])
omrsig_protect+0x2a7 (0x00007F25C7912957 [libj9prt29.so+0x28957])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x5be (0x00007F25B8DF0C5E [libj9jit29.so+0x145c5e])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1b4 (0x00007F25B8DF119C [libj9jit29.so+0x14619c])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x15a (0x00007F25B8DEF88E [libj9jit29.so+0x14488e])
_ZN2TR24CompilationInfoPerThread3runEv+0x31 (0x00007F25B8DEFFEF [libj9jit29.so+0x144fef])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x93 (0x00007F25B8DF00EA [libj9jit29.so+0x1450ea])
omrsig_protect+0x2a7 (0x00007F25C7912957 [libj9prt29.so+0x28957])
_Z21compilationThreadProcPv+0x1bc (0x00007F25B8DF04E7 [libj9jit29.so+0x1454e7])
thread_wrapper+0x162 (0x00007F25C76DDF12 [libj9thr29.so+0x9f12])
start_thread+0xea (0x00007F25CD7D41CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F25CD22B8D3 [libc.so.6+0x398d3])
There is another such assert in x/codegen/OMRMemoryReference.cpp that was changed to be fatal in https://github.com/eclipse/omr/pull/6937, but a few others were left non-fatal. I'm not sure if that was an oversight, or if it's somehow not as important for this displacement to have been handled properly in generateBinaryEncoding.
Actually, if I look at the jit dump, the problem seems to be in the interaction of a few different optimizations, and does not appear to be JITServer-specific. First, there's this bit of code that starts out like:
n23277n istore <temp slot 5>[#3423 Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg ) [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=0 vn=- li=- udi=- nc=1 flg=0x2000
n23080n isub [0x7f24d9151e00] bci=[25,7,222] rc=1 vc=362 vn=- li=- udi=- nc=2
n23078n iload value<auto slot 4>[#456 Auto] [flags 0x3 0x0 ] [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=362 vn=- li=- udi=- nc=0
n23079n iconst 0x7e000000 (X!=0 X>=0 ) [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=362 vn=- li=- udi=- nc=0 flg=0x104
and eventually gets optimized to:
[ 2621] O^O TREE SIMPLIFICATION: Normalized isub of iconst > 0 in node [0x7f24d9151e00] to iadd of -iconst
n23277n istore <temp slot 5>[#3423 Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg ) [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=3652 vn=- li=- udi=- nc=1 flg=0x2000
n23080n iadd [0x7f24d9151e00] bci=[25,7,222] rc=1 vc=0 vn=- li=- udi=- nc=2
n23078n iload value<auto slot 4>[#456 Auto] [flags 0x3 0x0 ] [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=3652 vn=- li=-1 udi=- nc=0
n23079n iconst 0x82000000 (X!=0 X<=0 ) [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=3652 vn=- li=-1 udi=- nc=0 flg=0x204
That value in iconst does not fit into an int32_t. (I think at least one comment in omr anticipates this situation and says it's fine). Finally, this gets decorated with cannotOverflow:
n23277n istore <temp slot 5>[#3423 Auto] [flags 0x20000003 0x0 ] (privatizedInlinerArg ) [0x7f24d9155b90] bci=[25,7,222] rc=0 vc=0 vn=- li=- udi=180 nc=1 flg=0x2000
n23080n iadd (X>=0 cannotOverflow ) [0x7f24d9151e00] bci=[25,7,222] rc=4 vc=0 vn=- li=- udi=- nc=2 flg=0x1100
n23078n iload value<auto slot 4>[#456 Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow ) [0x7f24d9151d60] bci=[25,4,222] rc=1 vc=0 vn=- li=- udi=977 nc=0 flg=0x1100
n23079n iconst 0x82000000 (X!=0 X<=0 ) [0x7f24d9151db0] bci=[25,5,222] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x204
The actual bit of IL that causes the assert starts out like this:
n23246n compressedRefs [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=385 vn=- li=- udi=- nc=2
n23244n aloadi <array-shadow>[#233 Shadow] [flags 0x80000607 0x0 ] [0x7f24d9155140] bci=[27,5,731] rc=2 vc=385 vn=- li=- udi=- nc=1
n23243n aladd (internalPtr ) [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2 flg=0x8000
n23231n ==>aloadi
n23242n ladd [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2
n23240n lshl [0x7f24d9155000] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=2
n23239n i2l (X>=0 ) [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=1 flg=0x100
n23234n ==>iload
n23238n iconst 2 (X!=0 X>=0 ) [0x7f24d9154f60] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x104
n23241n lconst 8 (highWordZero X!=0 X>=0 ) [0x7f24d9155050] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x4104
n23245n lconst 0 (highWordZero X==0 X>=0 X<=0 ) [0x7f24d9155190] bci=[27,5,731] rc=1 vc=385 vn=- li=- udi=- nc=0 flg=0x4302
After an ladd->lsub transformation and some LCSE, this becomes
n23246n compressedRefs [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=268 vn=- li=- udi=- nc=2
n23244n aloadi <array-shadow>[#233 Shadow] [flags 0x80000607 0x0 ] [0x7f24d9155140] bci=[27,5,731] rc=2 vc=268 vn=- li=- udi=- nc=1
n23243n aladd (X>=0 internalPtr ) [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x8100
n23231n ==>aloadi
n23242n lsub (highWordZero X>=0 cannotOverflow ) [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x5100
n23240n lmul (X>=0 cannotOverflow ) [0x7f24d9155000] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=2 flg=0x1100
n23239n i2l (highWordZero X>=0 ) [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=1 flg=0x4100
n23080n ==>iadd
n23238n lconst 4 (highWordZero X!=0 X>=0 ) [0x7f24d9154f60] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=0 flg=0x4104
n23241n lconst -8 (X!=0 X<=0 ) [0x7f24d9155050] bci=[27,5,731] rc=1 vc=268 vn=- li=- udi=- nc=0 flg=0x204
n260n ==>lconst 0
and finally we get this optimization:
[ 13820] O^O TREE SIMPLIFICATION: Distributed lmul with lconst over isub or iadd of with iconst in node [00007F24D9155000]
[ 13821] O^O TREE SIMPLIFICATION: Found lsub of lconst with ladd or lsub of x and lconst in node [00007F24D91550A0]
n23246n compressedRefs [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=370 vn=- li=- udi=- nc=2
n23244n aloadi <array-shadow>[#233 Shadow] [flags 0x80000607 0x0 ] [0x7f24d9155140] bci=[27,5,731] rc=2 vc=370 vn=- li=- udi=- nc=1
n23243n aladd (X>=0 internalPtr ) [0x7f24d91550f0] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=2 flg=0x8100
n23231n ==>aloadi
n23242n ladd (highWordZero X>=0 ) [0x7f24d91550a0] bci=[27,5,731] rc=1 vc=0 vn=- li=- udi=- nc=2 flg=0x4100
n45460n lmul [0x7f247f3a7170] bci=[27,5,731] rc=1 vc=0 vn=- li=- udi=- nc=2
n23239n i2l [0x7f24d9154fb0] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=1
n23078n ==>iload
n45461n lconst 4 (highWordZero X!=0 X>=0 ) [0x7f247f3a71c0] bci=[25,5,222] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x4104
n23241n lconst 0xfffffffe08000008 (X!=0 X<=0 ) [0x7f24d9155050] bci=[27,5,731] rc=1 vc=370 vn=- li=- udi=- nc=0 flg=0x204
n260n ==>lconst 0
The value 0xfffffffe08000008 is equal to (int64_t)(int32_t)0x82000000 * 4 + 8, which is what the combination of those two optimizations should produce, I think.
At instruction selection this turned into:
n23246n ( 0) compressedRefs [0x7f24d91551e0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=- nc=2
n23244n ( 2) l2a (in &GPR_0x7f24776f67c0) [0x7f24d9155140] bci=[27,5,731] rc=2 vc=13 vn=- li=229 udi=26560 nc=1
n53230n ( 0) iu2l (in &GPR_0x7f24776f67c0) [0x7f2525cbee20] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=26560 nc=1
n53229n ( 0) iloadi <array-shadow>[#233 Shadow] [flags 0x80000607 0x0 ] (in &GPR_0x7f24776f67c0) [0x7f2525cbedd0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=26560 nc=1
n23243n ( 0) aladd (X>=0 internalPtr ) [0x7f24d91550f0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=- nc=2 flg=0x8100
n23231n ( 0) ==>l2a (in &GPR_0x7f24776f5f40) (X!=0 )
n23242n ( 0) ladd (highWordZero X>=0 cannotOverflow ) [0x7f24d91550a0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=7 nc=2 flg=0x5100
n45460n ( 0) lshl [0x7f247f3a7170] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=5 nc=2
n23239n ( 0) i2l (in GPR_0x7f24776eced0) [0x7f24d9154fb0] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=52944 nc=1
n46696n ( 0) ==>iRegLoad (in GPR_0x7f24776eced0) (cannotOverflow SeenRealReference )
n45461n ( 0) iconst 2 (Unsigned X!=0 X>=0 ) [0x7f247f3a71c0] bci=[25,5,222] rc=0 vc=13 vn=- li=229 udi=1 nc=0 flg=0x4104
n23241n ( 0) lconst 0xfffffffe08000008 (X!=0 X<=0 ) [0x7f24d9155050] bci=[27,5,731] rc=0 vc=13 vn=- li=229 udi=1 nc=0 flg=0x204
n260n ( 0) ==>lconst 0 (highWordZero X==0 X>=0 X<=0 )
------------------------------
[0x7f24776f66b0] movsxd GPR_0x7f24776eced0, GPR_0x7f24776eced0 # MOVSXReg8Reg4
[0x7f24776f6840] mov &GPR_0x7f24776f67c0, dword ptr [&GPR_0x7f24776f5f40+4*GPR_0x7f24776eced0-0x1f7fffff8] # L4RegMem, SymRef <array-shadow>[#233 Shadow +134217736] [flags 0x80000607 0x0 ]
and -0x1f7fffff8 == 0xfffffffe08000008, which was the value of the displacement at the time of the assert. The dump writer itself seems to have crashed with the same "64-bit displacement" assert while trying to print off the VFP Substitution later on in the log, just after printing the line for movxsd.
Attn @mpirvu, though I'm fairly sure at this point that the assert is not JITServer-specific.
I think my understanding of the situation is correct, but maybe @hzongaro could comment. I'm not sure what the correct behaviour here should be. Also, should the other "64-bit displacement" asserts in that same file be made fatal?
Though, I suppose that all of that optimization could be correct. The assert does mention TR_AMD64MemoryReference::generateBinaryEncoding, which no longer exists. I'm not sure where in OMR::X86::AMD64::MemoryReference::generateBinaryEncoding() this would have been handled.
@BradleyWood, may I ask you to have a look at this? It hits a fatal assertion you had introduced in OMR pull request eclipse/omr#6937, which fixed issue #15363.
It's actually a non-fatal assert with the same message. I was running this test with a debug build.
It's actually a non-fatal assert with the same message. I was running this test with a debug build.
Ah! Thanks for the clarification.
I will take a look. Looks like this is the assert firing. Probably not related to eclipse/omr#6937
No, not directly related - just related in the sense that there's another case that a large displacement has made it through to binary encoding.
I don't think there is a functional issue here. The memory reference code could use some cleanup, but the binary length estimation code in OMR::AMD64::estimateBinaryLength() should account for an address load instruction.
Moving this out to the “Future” release for now, as it appears, from @BradleyWood’s analysis, that the TR_ASSERT itself needs cleaning up. It does not represent a functional problem.