Severe performance degradation in OpenJ9 vs. HotSpot/GraalVM on fuzzer-generated mutated Java code (MergeSort): 3–4x slower than other JVMs on long runs

Summary

During JVM testing with an automated mutational performance fuzzer, I discovered a reproducible and significant performance degradation in OpenJ9 (openj9-0.51.0, JDK 21.0.7) compared to HotSpot and GraalVM. The anomaly appears on a mutated, but semantically correct, MergeSort implementation, automatically generated by the fuzzer. In this particular case, OpenJ9 is consistently 3–4 times slower than HotSpot and GraalVM under realistic microbenchmarking conditions with JMH.

It is important to note that the mutated code is not synthetic or artificial, but the result of automated transformation of real Java code. Such patterns can naturally occur due to obfuscation, code generation, or advanced refactoring tools.

Environment

OS: Fedora Linux 37 (Workstation Edition)
Kernel: 6.5.12-100.fc37.x86_64
Architecture: x86_64
CPU: Intel i7-12700H
RAM: 32GB
JVM versions:
- OpenJ9: Eclipse OpenJ9 VM, openj9-0.51.0, JDK 21.0.7
- HotSpot: Java HotSpot(TM) 64-Bit Server VM, 21.0.7+8-LTS-245
- GraalVM: OpenJDK 64-Bit Server VM, 21.0.2+13-jvmci-23.1-b30

Microbenchmark (JMH) configuration

Warmup: 2 iterations × 20 seconds
Measurement: 5 iterations × 40 seconds
Forks: 3

Performance measurements

1. Original MergeSort (unmutated)

GraalVM: ~0.05 ms/op
HotSpot: ~0.045 ms/op
OpenJ9: ~0.021 ms/op (even faster than others)

All JVMs demonstrate comparable performance on the original code.

2. Mutated MergeSort (automatically generated by the fuzzer)

GraalVM: ~7550 ms/op
HotSpot: ~8550 ms/op
OpenJ9: ~29974 ms/op

OpenJ9 is consistently 3–4 times slower than other JVMs on the generated code.

Reproducibility

The slowdown is consistently reproduced in the described environment and with the specified JVM versions. All benchmarks were performed with standard JMH parameters and repeated with identical results.

Attachments

MergeSort.java (mutated source code)
MergeSort_original.java (original source code)
BenchmarkRunner.java (JMH harness)
JMH reports for all JVMs
General reports with stdout, stderr, analytic (for both source codes)

MergeSortOpenj9Regress.zip

Additional context

This anomaly was discovered using an automated JVM performance fuzzer developed as part of my master’s thesis project. The tool systematically generates and benchmarks mutated, but semantically correct, Java code across different JVMs to reveal performance and optimization differences.

I would greatly appreciate any comments, explanations, or recommendations from the OpenJ9 development team regarding this behavior. Thank you very much for your attention to this issue and for your ongoing work on OpenJ9!

Decompiled mutated bytecode (spoiler)

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package benchmark;

import java.util.Arrays;

class MergeSort {
  public MergeSort() {
  }

  public static void main(String[] var0) {
    int[] var2 = new int[]{6, 5, 12, 10, 9, 1};
    MergeSort var10000 = new MergeSort();
    int var10003 = var2.length;
    int[] var1 = new int[]{0, 10, 20, 0, 0};
    int var3 = var1[1];
    var10000.sort(var2, 0, var10003 - 1);
    System.out.println("Sorted Array:");
    System.out.println(Arrays.toString(var2));
  }

  void merge(int[] var1, int var2, int var3, int var4) {
    int var8 = var3 - var2 + 1;
    var4 -= var3;
    int[] var5 = new int[var8];
    int[] var6 = new int[var4];

    for(int var7 = 0; var7 < var8; ++var7) {
      var5[var7] = var1[var2 + var7];
    }

    for(int var12 = 0; var12 < var4; ++var12) {
      var6[var12] = var1[var3 + 1 + var12];
    }

    var3 = 0;
    int var13 = 0;

    for(var2 = var2; var3 < var8 && var13 < var4; ++var2) {
      if (var5[var3] <= var6[var13]) {
        var1[var2] = var5[var3];
        ++var3;
      } else {
        var1[var2] = var6[var13];
        ++var13;
      }
    }

    while(var3 < var8) {
      var1[var2] = var5[var3];
      ++var3;
      ++var2;
    }

    while(var13 < var4) {
      var1[var2] = var6[var13];
      ++var13;
      ++var2;
    }

  }

  void sort(int[] var1, int var2, int var3) {
    for(int var5 = 0; var5 < 504; ++var5) {
      if (57 > 50) {
        if (var2 >= var3) {
          break;
        }
      } else {
        if (this != null) {
        }

        if (var2 >= var3) {
          break;
        }
      }

      int var4 = (var2 + var3) / 2;
      ((MergeSort)this).sort(var1, var2, var4);
      if (42 != 43) {
      }

      ((MergeSort)this).sort(var1, var4 + 1, var3);
      ((MergeSort)this).merge(var1, var2, var4, var3);
    }

  }
}

May 26 '25 01:05 donebd

@hzongaro @vijaysun-omr fyi

May 26 '25 04:05 pshipton

@donebd, may I ask you to provide some details about how to build and run the test for a JMH-newbie, like me?

May 26 '25 21:05 hzongaro

@hzongaro Hi! Thank you for your interest and for the feedback.

Below are step-by-step instructions on how to build and run the JMH microbenchmark for MergeSort, even if you are new to JMH. All required files are in the provided archive. The approach is “manual” and does not require Gradle/Maven — you just need a JDK and the provided dependencies (JARs).

1. What’s included in the archive?

BenchmarkRunner.java – the main JMH benchmark harness (source).
mutated/benchmark/MergeSort.class – the mutated MergeSort bytecode (already compiled, in the correct package).
mutated/benchmark/BenchmarkRunner.class – compiled JMH harness.
All JMH JSON results, reports, etc.

2. Preparing for Manual Run

A. Download JMH and dependencies

You need the following JARs (you can copy them from your Gradle/Maven cache or download manually):

jmh-core-1.37.jar
jmh-generator-annprocess-1.37.jar
jopt-simple-5.0.4.jar
commons-math3-3.6.1.jar

For official JMH releases and instructions, see: ➡️ JMH Project page ➡️ JMH GitHub

B. File structure

Make sure the directory structure looks like this:

mutated/
├── benchmark/
│     ├── BenchmarkRunner.class
│     └── MergeSort.class
├── [JMH and dependency .jar files, or reference their absolute paths]

If you want to edit/rebuild BenchmarkRunner.java, place it into the same benchmark subfolder.

3. Compiling the benchmark (if you want to change source)

If you want to modify the Java files, do so before compiling. To (re)compile manually:

cd mutated
javac -cp "path/to/jmh-core-1.37.jar:path/to/jmh-generator-annprocess-1.37.jar" benchmark/*.java

(On Windows, use ; instead of : as path separator.)

4. Running the benchmark

You need to launch the JVM with a classpath that includes:

The current working directory (mutated)
The benchmark folder (it’s inside mutated)
All required JMH and dependency jars

Example command (Linux/macOS):

java \
  -cp "/full/path/to/mutated:/full/path/to/mutated/benchmark:/full/path/to/jmh-core-1.37.jar:/full/path/to/jmh-generator-annprocess-1.37.jar:/full/path/to/jopt-simple-5.0.4.jar:/full/path/to/commons-math3-3.6.1.jar" \
  benchmark.BenchmarkRunner

If all jars are in the same folder, you can use wildcards, but explicit paths are safest.
The main class to launch is benchmark.BenchmarkRunner.

5. Notes about paths and output

The package name must match the file path: classes are in benchmark/, Java files must start with package benchmark;.
If you edit the code, make sure all files are saved and recompiled before running.
The JMH output JSON will be generated at the location defined in .result(...) in BenchmarkRunner.java. Adjust this path if needed.
You can also remove/modify the .result(...) line to use a local or relative path.

6. Typical Troubleshooting

Class not found: Check that your -cp includes the folder with benchmark/, and that files are properly compiled.
Dependency missing: Download all jars listed above, or use the ones from your Maven/Gradle cache.
Cannot find symbol: If you edit the source, make sure you’re compiling with all dependencies in the classpath.

7. Documentation

For more details on JMH usage and options, see the official documentation:

TL;DR for OpenJ9/HotSpot/GraalVM:

Place compiled classes in benchmark/, ensure classpath includes all required jars.

Launch with:

java -cp "<all_paths>" benchmark.BenchmarkRunner

Inspect the JSON output.

Let me know if you need a ready-made shell script for this, or if you encounter any trouble running the benchmarks!

Jun 15 '25 20:06 donebd

@donebd, thank you for the detailed instructions! I will try it out and report back.

Jun 16 '25 14:06 hzongaro

I've spent a little time over the past couple of days playing with this test. I wrote a small wrapper for the MergeSort test, just for simplicity:

package benchmark;

public class RunIt {
    public static final void main(String[] args) {
        int iters = Integer.parseInt(args[0]);
        for (int i = 0; i < iters; i++) {
            long startTime = System.currentTimeMillis();
            MergeSort.main(args);
            System.out.println("Time:  "+(System.currentTimeMillis()-startTime)+"ms");
        }
    }
}

In a trial run of five iterations with no options, I see the following result. This was just running in a test VM, not on any sort of performance machine, so the numbers should be taken with a grain of salt. Despite that, the results were pretty consistent.

[1, 5, 6, 9, 10, 12]
Time:  11339ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  25479ms
Sorted Array:
[1, 5, 6, 9, 10, 12]       
Time:  25077ms             
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  25386ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  25645ms

These were the hottest methods:

Samples: 479K of event 'cpu-clock:pppH', Event count (approx.): 119805750000
Overhead  Command          Shared Object                                                      Symbol
  86.37%  main             [JIT] tid 668050                                                   [.] benchmark/MergeSort.sort([III)V_scorching
   3.34%  main             [JIT] tid 668050                                                   [.] benchmark/MergeSort.sort([III)V_very-hot
   0.78%  main             [JIT] tid 668050                                                   [.] benchmark/MergeSort.merge([IIII)V_scorching
   0.51%  main             [JIT] tid 668050                                                   [.] benchmark/MergeSort.sort([III)V_warm

When I reran specifying -Xjit:{benchmark/MergeSort.sort*}\(optLevel=...\) with the value cold, warm, hot or scorching, results were consistently in the neighbourhood of 10000ms. From a run with -Xjit:{benchmark/MergeSort.sort*}\(optLevel=scorching\) I saw:

Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  9910ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  11101ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  11337ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  11527ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time:  11230ms

with the following appearing as the hottest methods:

Samples: 231K of event 'cpu-clock:pppH', Event count (approx.): 57889500000
Overhead  Command          Shared Object                                                      Symbol
  47.77%  main             [JIT] tid 668092                                                   [.] benchmark/MergeSort.merge([IIII)V_scorching
  42.69%  main             [JIT] tid 668092                                                   [.] benchmark/MergeSort.sort([III)V_scorching
   2.90%  main             libj9vm29.so                                                       [.] VM_BytecodeInterpreterCompressed::run
   0.68%  main             [JIT] tid 668092                                                   [.] benchmark/MergeSort.merge([IIII)V_warm
   0.31%  main             [JIT] tid 668092                                                   [.] benchmark/MergeSort.merge([IIII)V_very-hot

This suggests that MergeSort.merge is being inlined into MergeSort.sort with the default options, and that somehow results in a significant drag on performance.

@nbhuiyan, may I ask you to take a look at this? I'll move this out to the ~0.56~ 0.57 release for now.

Aug 15 '25 14:08 hzongaro

This suggests that MergeSort.merge is being inlined into MergeSort.sort with the default options, and that somehow results in a significant drag on performance.

I should clarify that the problem doesn't necessarily lie with inlining. It could be an issue with the profiling data that's captured by the very-hot compilation, or an issue with some other optimization that's not dealing well with the trees from the inlined call to MergeSort.merge.

Aug 15 '25 14:08 hzongaro

openj9 openj9 copied to clipboard