BOLT
BOLT copied to clipboard
[Question] Is it possible to optimize java applications with BOLT?
We're running a large legacy java application with Spring and Hibernate. What could be the workarounds to make BOLT work with the jvm?
In theory, it's possible to optimize a JVM. I would expect it to contain some amount of assembly code that may present challenges to BOLT. What version of JVM are you running?
We're on Java version 1.8 currently
@maksfb any update? How could one use BOLT with Java? It would allow a performance improvement for most servers in the world and for every Android applications too!
Hi @LifeIsStrange, Recently I was able to optimize libjvm.so
(from graalvm-ce-java8-linux-amd64-20.1.0
) using BOLT with flags -reorder-blocks=cache+ -split-all-cold -dyno-stats
. For our benchmarks (Renaissance benchmarks suite), libjvm.so
was one the main ones causing icache misses and the bolt-optimized libjvm.so
suffered from less icache misses.
@takhandipu Hello
You were able to optimize java with BOLT. I was wondering how you did it? Can you give me a summary of the function. I am very beginner. Thanks
@takhandipu while very interesting (and definitely should potentially become common practice), IF I understand correctly, you optimized the JVM C++ binary, and not the runtime assembly outputed by the JIT? As for the latter I'm not aware if that is possible. Though a low hanging fruit would be to BOLT a graalvm native binary https://www.graalvm.org/22.1/reference-manual/native-image/ (note that while doing, one could also bolt libjvm.so like you did, as a complementary measure?)
BOLT can't optimize jitted code. BOLT is a static binary optimizer, so anything generated during runtime is out of reach.
BOLT can be used to optimize the VM's library code, though, which can lead to measurable wins. In some VMs, a lot of time is spent running C++-written library code that the jitted code frequently calls, so optimizing this can be impactful. This is what happens with HHVM, for example.
Yes that is why I was referring to graalVM native AOT java binaries that in theory should be boltable. Btw since you work at Facebook I'd like to show you the extremely huge missed opportunity that is GraalPHP, as you can see on the benchmarks it very significantly outperform HHVM https://github.com/abertschi/graalphp/blob/master/results.md Which make sense since it reuses state of the art OpenJDK derived optimizations. It also allow transparent language interop without needing to write brittle FFI boilerplate, with almost any language (interop with Java, JS, python, ruby, C++)
Please guide me exactly which part of openjdk that I built should I optimize with bolt? I do not understand. Thanks Can you give an example?
@ZahraHeydari95 I'm not sure I've never done it before. @takhandipu was not talking about openjdk but about the sibling project graalvm He stated the package name: graalvm-ce-java8-linux-amd64-20.1.0
Once you have it, explore the folder and try to find a file named libjvm.so Then, install BOLT (cf repository Readme, or via recent LLVM), run it with the flags he stated -reorder-blocks=cache+ -split-all-cold -dyno-stats on libjvm.so Make sure the old file is replaced by the new version generated by BOLT. And then you can use graalvm to run Java code on a normal way (see online documentation about graalvm), benchmark and observe or not performance gains vs unmodified graalvm. Report back if you manage to run a benchmark
@LifeIsStrange Thank you very much for your advice. I found libjvm.so. But for the perf record step during BOLT optimization, what should I do running profiling? Can it be done without profiling?
I'm not sure what it the precise difference between BOLT and regular PGO/autoFDO, if I understand correctly you need to do an execution of the program where you try to simulate common usages of the software, as to show to the optimizer the critical hot paths, a bit like a JIT. Unfortunately doing just one run seems to me arbitrary and risk overfitting for somebusage pattern while disregard most other critical code paths, unlile a JIT. Maybe @rafaelauler can explain to us what is the recommended good practice?
@LifeIsStrange @rafaelauler
Yes, it is true, I know that if it is possible to know the hot paths, the number of taken and not taken branches, hot and cold functions, etc. in one run of the program, that is called profiling. Then I can increase the severity of the optimization by injecting this information into bolt when optimizing but I don't know how to do this for libjvm.so? And even for other cases where Bolt might be applicable in Java (other libraries like libjvm.so). please guide me. Many thanks.
To maximize wins, you need to collect the profile that captures the broadest set of scenarios you can. Note that BOLT is mostly used to reorder basic blocks and reorder functions (layout optimizations). Because of that, if you collect partial profile and the order of the blocks is not perfect, it is not going to cause the program to be slower than the original because typically the compiler is not very good at coming with a good order. But a partial profile means missed potential wins.
To collect profile for libjvm.so, I would do (now, it's been a while since I dealt with Java, so I'm going to guess some stuff so you get the idea):
$ sudo perf record -e cycles:u -j any,u -a -o perf.data -- sleep 20
This will collect system wide samples for 20 seconds. There are other ways to do this, but I'm going for system-wide profile because it is going to profile everything that ran in your system for a given amount of time. The disadvantage is that it could produce a very large perf.data file. If the profile doesn't work for you, try profiling the java command directly (replace "sleep 20" with the command below, and remove the sudo as well as the "-a" flag).
In another terminal: $ /your/graal/path/java MyBenchmark # some program that runs for 20 seconds or whatever you specify above
Then go to my perf record terminal $ sudo chown rafaelauler perf.data # change to your username $ cp /your/graal/path/libjvm.so . # replace with the library location in your system $ perf2bolt -p perf.data -o libjvm.fdata libjvm.so
Now that my profile is in libjvm.fdata, I can run BOLT: $ llvm-bolt libjvm.so -o libjvm-bolt-optimized.so -data libjvm.fdata -reorder-blocks=ext-tsp -split-functions
Now copy the optimized lib (if all commands succeeded) back to its place: $ cp libjvm-bolt-optimized.so /your/graal/path/libjvm.so
-- The problem with the approach above is that you are using the supplied libjvm.so binary in the package, which might lack a symbol table (if it is stripped). This will probably make BOLT be unable to fully disassemble it, if it is really stripped. Another problem is that it is definitely lacking relocations, which are linker annotations that we preserve when building a binary to be consumed by BOLT. If you keep relocations, BOLT can be way more effective because it will be able to move functions around in the binary, increasing cache/iTLB effectiveness.
To be able to produce a libjvm.so that is more suitable for BOLT, you need to build the graalvm yourself from the source code, and I have never done that, so I can't help you there. But the process would be typically to find in the build system where to change linker flags, and add the flag "-Wl,-q" to the linker, asking it to preserve relocations in all binaries it links. Then you would install java somewhere and run the steps above, taking care to replace each command line with the new location where you installed your vm built from source.