openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

JDK8 Crash with either "Segmentation error vmState=0x00000000" or "Invalid JIT return address"

Open connglli opened this issue 2 years ago • 2 comments

Java -version output

openjdk version "1.8.0_342-internal"
OpenJDK Runtime Environment (build 1.8.0_342-internal-_2022_06_10_15_18-b00)
Eclipse OpenJ9 VM (build master-3d06b2f9c, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20220610_000000 (JIT enabled, AOT enabled)
OpenJ9   - 3d06b2f9c
OMR      - cf8ddbd1a
JCL      - 2bb179375a based on jdk8u342-b05)

Summary of problem

Sorry this test is a bit large but we are unable to reduce it further. The test is somewhat tricky because it sometimes crashes with a Segmentation error and full stacktrace; sometimes just crashes with outputting "Invalid JIT return address".

Also, this is perhaps a JIT bug since -Xint can hide it.

public class Test {
  public static final int N = 256;
  public static long instanceCount = 9593L;
  public static byte byFld = -75;
  public static boolean bFld = false;

  public static int iMeth(boolean b, int i2) {
    float f4 = -2.554F;
    int i20 = 55118,
        i21 = 36767,
        i22 = -33501,
        i23 = -196,
        i24 = 1188,
        i25 = 10,
        i26 = 2080,
        iArr[] = new int[N];
    short s2 = 12562;
    for (i20 = 1; i20 < 217; i20++) {
      i2 += i20;
      iArr[i20 - 1] -= i21;
      i2 += (i20 - i2);
      if (b) continue;
      i21 = 8;
      f4 = i21;
      iArr[i20] = i2;
      Test.byFld += (byte) i21;
      i2 += (((i20 * f4) + i20) - Test.instanceCount);
      Test.instanceCount = -77;
      iArr[i20 + 1] = i21;
      if (i2 != 0) {}
      f4 -= Test.instanceCount;
    }
    for (int ax$7 = -635; ax$7 < 5443; ax$7 += 1) {
      boolean ax$4 = b;
      int ax$5 = i20;
      try {
        int[] ax$0 = {1, 2, 3, 4};
        int ax$3;
        for (int ax$1 = 1; ax$1 < i20; ax$1++) {
          for (int ax$2 = 0; ax$2 < i20 - ax$1; ax$2++) {
            if (ax$0[ax$2] > ax$0[ax$2 + 1]) {
              ax$3 = ax$0[ax$2];
              ax$0[ax$2] = ax$0[ax$2 + 1];
              ax$0[ax$2 + 1] = ax$3;
            }
          }
        }
      } catch (Throwable ax$6) {
      } finally {
        b = ax$4;
        i20 = ax$5;
      }
    }
    do {
      if (b) break;
    } while (++i22 < 143);
    long meth_res =
        (b ? 1 : 0)
            + i2
            + Float.floatToIntBits(f4)
            + i20
            + i21
            + i22
            + i23
            + i24
            + s2
            + i25
            + i26
            + FuzzerUtils.checkSum(iArr);
    return (int) meth_res;
  }

  public void mainTest(String[] strArr1) {
    double d = -2.106236, dArr1[] = new double[N];
    int i = -1, i1 = 91, i27 = -1, i28 = 94, i29 = 7, i30 = 4, i31 = -7, i32 = 238;
    for (d = 81; 2 < d; d -= 2) i1 = (int) ((-(i1 + -1.60531)) * iMeth(Test.bFld, -3));
  }

  public static void main(String[] strArr) {
    try {
      Test _instance = new Test();
      for (int i = 0; i < 10; i++) {
        _instance.mainTest(strArr);
      }
    } catch (Exception ex) {
      FuzzerUtils.out.println(ex.getClass().getCanonicalName());
    }
  }
}

Diagnostic files

By issuing

$ java Test

the following crash log is given:

unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=00007F1A712A7020 Handler2=00007F1A7108FEF0 InaccessibleAddress=0000000400000008
RDI=00007F1A6BD93740 RSI=000000000011B860 RAX=00007F1A714584E0 RBX=000000000004DD00
RCX=0000000400000004 RDX=0000000400000004 R8=00000000000000D9 R9=0000000000000000
R10=0000000000000000 R11=00000000000000D9 R12=000000000011B7B0 R13=000000000011B7C0
R14=00007F1A6C47E362 R15=00007F1A714ED0A0
RIP=00007F1A712F6FD0 GS=0000 FS=0000 RSP=00007F1A714ECD30
EFlags=0000000000010202 CS=0033 RBP=0000000000000000 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000400000008
xmm0 0000000000270600 (f: 2557440.000000, d: 1.263543e-317)
xmm1 c1e0000044fca000 (f: 1157406720.000000, d: -2.147484e+09)
xmm2 00007f1a714ed0d0 (f: 1900990720.000000, d: 6.904644e-310)
xmm3 c1e0000044fb8000 (f: 1157332992.000000, d: -2.147484e+09)
xmm4 2323232323232323 (f: 589505344.000000, d: 2.008777e-139)
xmm5 bff0000000000000 (f: 0.000000, d: -1.000000e+00)
xmm6 bfba4e76ce8c0e5e (f: 3465285120.000000, d: -1.027598e-01)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000042aa0000 (f: 1118437376.000000, d: 5.525815e-315)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/zdata/congli/OpenJ9/jdk8/jre/lib/amd64/default/libj9vm29.so
Module_base_address=00007F1A71268000
Target=2_90_20220610_000000 (Linux 5.4.0-117-generic)
CPU=amd64 (128 logical CPUs) (0x3ee84d8000 RAM)
----------- Stack Backtrace -----------
_ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x5db0 (0x00007F1A712F6FD0 [libj9vm29.so+0x8efd0])
bytecodeLoopCompressed+0xad (0x00007F1A712F120D [libj9vm29.so+0x8920d])
 (0x00007F1A7139B642 [libj9vm29.so+0x133642])
 ---------------------------------------
 JVMDUMP039I Processing dump event "gpf", detail "" at 2022/07/01 17:57:28 - please wait.
 JVMDUMP032I JVM requested System dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/core.20220701.175728.522725.0001.dmp' in response to an event
 JVMDUMP010I System dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/core.20220701.175728.522725.0001.dmp
 JVMDUMP032I JVM requested Java dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/javacore.20220701.175728.522725.0002.txt' in response to an event
 JVMDUMP012E Error in Java dump: /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/javacore.20220701.175728.522725.0002.txt
 JVMDUMP032I JVM requested Snap dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/Snap.20220701.175728.522725.0003.trc' in response to an event
 JVMDUMP010I Snap dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/Snap.20220701.175728.522725.0003.trc
 JVMDUMP032I JVM requested JIT dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/jitdump.20220701.175728.522725.0004.dmp' in response to an event
 JVMDUMP051I JIT dump occurred in 'main' thread 0x000000000004DD00
 JVMDUMP010I JIT dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/jitdump.20220701.175728.522725.0004.dmp
 JVMDUMP013I Processed dump event "gpf", detail "".

Or sometimes just output:



*** Invalid JIT return address 0000000400000004 in 000000000004E000

15:56:26.134 0x4dd00    j9vm.249    *   ** ASSERTION FAILED ** at /root/hostdir/openj9-openjdk-jdk8/openj9/runtime/vm/swalk.c:1602: ((0 ))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/07/01 17:56:26 - please wait.
JVMDUMP032I JVM requested System dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/core.20220701.175626.443427.0001.dmp' in response to an event
JVMDUMP010I System dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/core.20220701.175626.443427.0001.dmp
JVMDUMP032I JVM requested Java dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/javacore.20220701.175626.443427.0002.txt' in response to an event
JVMDUMP012E Error in Java dump: /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/javacore.20220701.175626.443427.0002.txt
JVMDUMP032I JVM requested Snap dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/Snap.20220701.175626.443427.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/90.openj9/mutant/red/Snap.20220701.175626.443427.0003.trc
JVMDUMP013I Processed dump event "traceassert", detail "".

Please also check openj9-bug-90.tar.gz for the two types of logs (core, snap, etc.) and the test (Test.java, Test.class).

connglli avatar Jul 01 '22 16:07 connglli

Bugs I've posted today is a little bit difficult for us to handle. They all behave some non-determinism on their outputted logs and traces but they are indeed crashes.

connglli avatar Jul 01 '22 16:07 connglli

@jdmpapin : please investigate.

We won't be able to get to the bottom of this for 0.35. Moving to 0.36.

0xdaryl avatar Sep 20 '22 09:09 0xdaryl

The problem here is caused by out-of-bounds writes to the array ax$0. We've written past the end of it, and furthermore, it happens to have been stack-allocated by escape analysis, so the stray writes overwrote autos, the return address, and into the caller's frame.

The bound checks were skipped because of a bug in loop versioner.

First, the bound check was versioned out of the innermost (ax$2) loop. This created versioning tests ahead of that loop, within the middle (ax$1) loop. Two of the three always pass (given the initial values of autos), but the remaining one correctly prevented any out-of-bounds accesses. For these tests versioner generates a monster tree, but the important versioning test here is eventually (correctly) reduced to this:

n2463n    istore  <temp slot 27>[#463  Auto] [flags 0x3 0x0 ]
n1139n      isub (X>=0 cannotOverflow )
n1137n        iconst 217 (X!=0 X>=0 cannotOverflow )
n1138n        iload  ax$1<auto slot 17>[#435  Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow )
n1156n    ifiucmpgt --> block_81 BBStart at n1042n ()
n1154n      isub (X>=0 cannotOverflow )
n1139n        ==>isub
n1153n        iconst -1 (X!=0 X<=0 )
n1155n      iconst 4 (X!=0 X>=0 )

i.e. jump away to the cold loop when (217 - ax$1) + 1 is (unsigned) greater than 4. 217 - ax$1 is the loop bound of the inner loop, and I believe the +1 is accounting for ax$0[ax$2 + 1]. So far so good.

The bug occurs in a later pass of versioner. It doesn't show in the log, but this test n2463n is flagged as isVersionableIfWithMaxExpr(), which means that even though it is not loop-invariant, it can be versioned by maximizing the variant child of the comparison, i.e. n1154n. Versioner sees that this is an expression based on an increasing loop induction variable (ax$1), so in order to maximize the expression, it maximizes the IV by producing a sort of final value for it. The problem is that because the IV appears in a negative position within the expression, maximizing it actually minimizes the whole expression, and as a result, the new versioning test generated ahead of the middle loop is incorrect. It is eventually removed after getting simplified to this:

n2718n    ifiucmpgt --> block_162 BBStart at n2663n ()
n2716n      iconst 2 (X!=0 X>=0 )
n249n       ==>iconst 4

Without this versioning test (or with it spuriously passing), there is nothing preventing control from entering into the hot version of the loop and skipping bound checks, even when the loop bound is large.

For me, the original test case always fails one way or another, but it passes if I prevent the incorrect versioning here.

I've created a simpler reproduction of this bug. It should print 42 twice, but instead it prints 42 followed by -1 when run like so:

$ java -Xjit:'disableOSR,limit={*.go(*)*},count=1,disableAsyncCompilation,initialOptLevel=hot,inhibitRecompilation,optFile=opts.list' Repro

The fundamental problem is that the logic that builds the outer versioning test implicitly assumes that the expression to maximize/minimize is increasing as a function of the IV.

jdmpapin avatar Oct 13 '22 19:10 jdmpapin

Moving to 0.38.

0xdaryl avatar Nov 15 '22 10:11 0xdaryl