openj9
openj9 copied to clipboard
Severe performance degradation in OpenJ9 vs. HotSpot/GraalVM on fuzzer-generated mutated Java code (MergeSort): 3–4x slower than other JVMs on long runs
Severe performance degradation in OpenJ9 vs. HotSpot/GraalVM on fuzzer-generated mutated Java code (MergeSort): 3–4x slower than other JVMs on long runs
Summary
During JVM testing with an automated mutational performance fuzzer, I discovered a reproducible and significant performance degradation in OpenJ9 (openj9-0.51.0, JDK 21.0.7) compared to HotSpot and GraalVM. The anomaly appears on a mutated, but semantically correct, MergeSort implementation, automatically generated by the fuzzer. In this particular case, OpenJ9 is consistently 3–4 times slower than HotSpot and GraalVM under realistic microbenchmarking conditions with JMH.
It is important to note that the mutated code is not synthetic or artificial, but the result of automated transformation of real Java code. Such patterns can naturally occur due to obfuscation, code generation, or advanced refactoring tools.
Environment
- OS: Fedora Linux 37 (Workstation Edition)
- Kernel: 6.5.12-100.fc37.x86_64
- Architecture: x86_64
- CPU: Intel i7-12700H
- RAM: 32GB
- JVM versions:
- OpenJ9: Eclipse OpenJ9 VM, openj9-0.51.0, JDK 21.0.7
- HotSpot: Java HotSpot(TM) 64-Bit Server VM, 21.0.7+8-LTS-245
- GraalVM: OpenJDK 64-Bit Server VM, 21.0.2+13-jvmci-23.1-b30
Microbenchmark (JMH) configuration
- Warmup: 2 iterations × 20 seconds
- Measurement: 5 iterations × 40 seconds
- Forks: 3
Performance measurements
1. Original MergeSort (unmutated)
- GraalVM: ~0.05 ms/op
- HotSpot: ~0.045 ms/op
- OpenJ9: ~0.021 ms/op (even faster than others)
All JVMs demonstrate comparable performance on the original code.
2. Mutated MergeSort (automatically generated by the fuzzer)
- GraalVM: ~7550 ms/op
- HotSpot: ~8550 ms/op
- OpenJ9: ~29974 ms/op
OpenJ9 is consistently 3–4 times slower than other JVMs on the generated code.
Reproducibility
The slowdown is consistently reproduced in the described environment and with the specified JVM versions. All benchmarks were performed with standard JMH parameters and repeated with identical results.
Attachments
- MergeSort.java (mutated source code)
- MergeSort_original.java (original source code)
- BenchmarkRunner.java (JMH harness)
- JMH reports for all JVMs
- General reports with stdout, stderr, analytic (for both source codes)
Additional context
This anomaly was discovered using an automated JVM performance fuzzer developed as part of my master’s thesis project. The tool systematically generates and benchmarks mutated, but semantically correct, Java code across different JVMs to reveal performance and optimization differences.
I would greatly appreciate any comments, explanations, or recommendations from the OpenJ9 development team regarding this behavior. Thank you very much for your attention to this issue and for your ongoing work on OpenJ9!
Decompiled mutated bytecode (spoiler)
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//
package benchmark;
import java.util.Arrays;
class MergeSort {
public MergeSort() {
}
public static void main(String[] var0) {
int[] var2 = new int[]{6, 5, 12, 10, 9, 1};
MergeSort var10000 = new MergeSort();
int var10003 = var2.length;
int[] var1 = new int[]{0, 10, 20, 0, 0};
int var3 = var1[1];
var10000.sort(var2, 0, var10003 - 1);
System.out.println("Sorted Array:");
System.out.println(Arrays.toString(var2));
}
void merge(int[] var1, int var2, int var3, int var4) {
int var8 = var3 - var2 + 1;
var4 -= var3;
int[] var5 = new int[var8];
int[] var6 = new int[var4];
for(int var7 = 0; var7 < var8; ++var7) {
var5[var7] = var1[var2 + var7];
}
for(int var12 = 0; var12 < var4; ++var12) {
var6[var12] = var1[var3 + 1 + var12];
}
var3 = 0;
int var13 = 0;
for(var2 = var2; var3 < var8 && var13 < var4; ++var2) {
if (var5[var3] <= var6[var13]) {
var1[var2] = var5[var3];
++var3;
} else {
var1[var2] = var6[var13];
++var13;
}
}
while(var3 < var8) {
var1[var2] = var5[var3];
++var3;
++var2;
}
while(var13 < var4) {
var1[var2] = var6[var13];
++var13;
++var2;
}
}
void sort(int[] var1, int var2, int var3) {
for(int var5 = 0; var5 < 504; ++var5) {
if (57 > 50) {
if (var2 >= var3) {
break;
}
} else {
if (this != null) {
}
if (var2 >= var3) {
break;
}
}
int var4 = (var2 + var3) / 2;
((MergeSort)this).sort(var1, var2, var4);
if (42 != 43) {
}
((MergeSort)this).sort(var1, var4 + 1, var3);
((MergeSort)this).merge(var1, var2, var4, var3);
}
}
}
@hzongaro @vijaysun-omr fyi
@donebd, may I ask you to provide some details about how to build and run the test for a JMH-newbie, like me?
@hzongaro Hi! Thank you for your interest and for the feedback.
Below are step-by-step instructions on how to build and run the JMH microbenchmark for MergeSort, even if you are new to JMH. All required files are in the provided archive. The approach is “manual” and does not require Gradle/Maven — you just need a JDK and the provided dependencies (JARs).
1. What’s included in the archive?
BenchmarkRunner.java– the main JMH benchmark harness (source).mutated/benchmark/MergeSort.class– the mutated MergeSort bytecode (already compiled, in the correct package).mutated/benchmark/BenchmarkRunner.class– compiled JMH harness.- All JMH JSON results, reports, etc.
2. Preparing for Manual Run
A. Download JMH and dependencies
You need the following JARs (you can copy them from your Gradle/Maven cache or download manually):
jmh-core-1.37.jarjmh-generator-annprocess-1.37.jarjopt-simple-5.0.4.jarcommons-math3-3.6.1.jar
For official JMH releases and instructions, see: ➡️ JMH Project page ➡️ JMH GitHub
B. File structure
Make sure the directory structure looks like this:
mutated/
├── benchmark/
│ ├── BenchmarkRunner.class
│ └── MergeSort.class
├── [JMH and dependency .jar files, or reference their absolute paths]
If you want to edit/rebuild BenchmarkRunner.java, place it into the same benchmark subfolder.
3. Compiling the benchmark (if you want to change source)
If you want to modify the Java files, do so before compiling. To (re)compile manually:
cd mutated
javac -cp "path/to/jmh-core-1.37.jar:path/to/jmh-generator-annprocess-1.37.jar" benchmark/*.java
- (On Windows, use
;instead of:as path separator.)
4. Running the benchmark
You need to launch the JVM with a classpath that includes:
- The current working directory (
mutated) - The
benchmarkfolder (it’s insidemutated) - All required JMH and dependency jars
Example command (Linux/macOS):
java \
-cp "/full/path/to/mutated:/full/path/to/mutated/benchmark:/full/path/to/jmh-core-1.37.jar:/full/path/to/jmh-generator-annprocess-1.37.jar:/full/path/to/jopt-simple-5.0.4.jar:/full/path/to/commons-math3-3.6.1.jar" \
benchmark.BenchmarkRunner
- If all jars are in the same folder, you can use wildcards, but explicit paths are safest.
- The main class to launch is
benchmark.BenchmarkRunner.
5. Notes about paths and output
- The package name must match the file path: classes are in
benchmark/, Java files must start withpackage benchmark;. - If you edit the code, make sure all files are saved and recompiled before running.
- The JMH output JSON will be generated at the location defined in
.result(...)inBenchmarkRunner.java. Adjust this path if needed. - You can also remove/modify the
.result(...)line to use a local or relative path.
6. Typical Troubleshooting
- Class not found: Check that your
-cpincludes the folder withbenchmark/, and that files are properly compiled. - Dependency missing: Download all jars listed above, or use the ones from your Maven/Gradle cache.
- Cannot find symbol: If you edit the source, make sure you’re compiling with all dependencies in the classpath.
7. Documentation
For more details on JMH usage and options, see the official documentation:
TL;DR for OpenJ9/HotSpot/GraalVM:
-
Place compiled classes in
benchmark/, ensure classpath includes all required jars. -
Launch with:
java -cp "<all_paths>" benchmark.BenchmarkRunner -
Inspect the JSON output.
Let me know if you need a ready-made shell script for this, or if you encounter any trouble running the benchmarks!
@donebd, thank you for the detailed instructions! I will try it out and report back.
I've spent a little time over the past couple of days playing with this test. I wrote a small wrapper for the MergeSort test, just for simplicity:
package benchmark;
public class RunIt {
public static final void main(String[] args) {
int iters = Integer.parseInt(args[0]);
for (int i = 0; i < iters; i++) {
long startTime = System.currentTimeMillis();
MergeSort.main(args);
System.out.println("Time: "+(System.currentTimeMillis()-startTime)+"ms");
}
}
}
In a trial run of five iterations with no options, I see the following result. This was just running in a test VM, not on any sort of performance machine, so the numbers should be taken with a grain of salt. Despite that, the results were pretty consistent.
[1, 5, 6, 9, 10, 12]
Time: 11339ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 25479ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 25077ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 25386ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 25645ms
These were the hottest methods:
Samples: 479K of event 'cpu-clock:pppH', Event count (approx.): 119805750000
Overhead Command Shared Object Symbol
86.37% main [JIT] tid 668050 [.] benchmark/MergeSort.sort([III)V_scorching
3.34% main [JIT] tid 668050 [.] benchmark/MergeSort.sort([III)V_very-hot
0.78% main [JIT] tid 668050 [.] benchmark/MergeSort.merge([IIII)V_scorching
0.51% main [JIT] tid 668050 [.] benchmark/MergeSort.sort([III)V_warm
When I reran specifying -Xjit:{benchmark/MergeSort.sort*}\(optLevel=...\) with the value cold, warm, hot or scorching, results were consistently in the neighbourhood of 10000ms. From a run with -Xjit:{benchmark/MergeSort.sort*}\(optLevel=scorching\) I saw:
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 9910ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 11101ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 11337ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 11527ms
Sorted Array:
[1, 5, 6, 9, 10, 12]
Time: 11230ms
with the following appearing as the hottest methods:
Samples: 231K of event 'cpu-clock:pppH', Event count (approx.): 57889500000
Overhead Command Shared Object Symbol
47.77% main [JIT] tid 668092 [.] benchmark/MergeSort.merge([IIII)V_scorching
42.69% main [JIT] tid 668092 [.] benchmark/MergeSort.sort([III)V_scorching
2.90% main libj9vm29.so [.] VM_BytecodeInterpreterCompressed::run
0.68% main [JIT] tid 668092 [.] benchmark/MergeSort.merge([IIII)V_warm
0.31% main [JIT] tid 668092 [.] benchmark/MergeSort.merge([IIII)V_very-hot
This suggests that MergeSort.merge is being inlined into MergeSort.sort with the default options, and that somehow results in a significant drag on performance.
@nbhuiyan, may I ask you to take a look at this? I'll move this out to the ~0.56~ 0.57 release for now.
This suggests that MergeSort.merge is being inlined into MergeSort.sort with the default options, and that somehow results in a significant drag on performance.
I should clarify that the problem doesn't necessarily lie with inlining. It could be an issue with the profiling data that's captured by the very-hot compilation, or an issue with some other optimization that's not dealing well with the trees from the inlined call to MergeSort.merge.