graaljs
graaljs copied to clipboard
Perf of warmed-up ClojureScript unit tests slow on Graal.JS relative to Nashorn
If you run the ClojureScript compiler's unit tests in a loop and allow things to warm up, they end up running more slowly in Graal.JS than under Nashorn.
The numbers above were produced on a Mac, using GraalVM 1.0.0-rc4 (ee edition). Nashorn is 1.8.0_172-b11.
To reproduce using the actual ClojureScript compiler unit tests, the steps involve:
- Modify the ClojureScript compiler's tests so that they run in an infinite loop (optionally also capturing timing metrics)
- Build the tests (which involves producing a JavaScript file that can be executed in any engine)
- Run the tests, allowing some time (say 1/2 hour or so) for them to settle
To make things more convenient, I've attached a copy of the built tests, and attached them directly in the "Run the tests" section.
Modify the ClojureScript compiler's tests
The ClojureScript compiler is at https://github.com/clojure/clojurescript
Modifications are at https://github.com/mfikes/clojurescript/commit/1c76cf62b1c69e0a81f3663559215c62fe09ae31
and to make things easier, that branch could just be checked out and used as-is. Roughly, these modifications involve setting up some atoms (mutable state) to track timing information, and putting the tests themselves in an infinite loop ((while true ...)
in ClojureScript), doing some accounting for timing, and printing out the timing with the string DELTA
which can be grep
ped for.
Build the tests
To build the tests, run script/bootstrap
and then script/test
in the tree. This will cause it to build them and then after logging
Applying optimizations :advanced to 343 sources
it will automatically attempt to run them under various JavaScript engines if any are configured per https://clojurescript.org/community/running-tests. (No JavaScript engines need to be configured for this repro). If it starts running the tests in any engine, Ctrl-C them (because they are in an infinite loop).
This will result in a JavaScript file, builds/out-adv/core-advanced-test.js
that can be run in any engine.
Run the tests
For convenience, here is a built copy of the test JavaScript file: core-advanced-test.js.gz
Then to run the tests, say, using Graal.JS, do
/path/to/js builds/out-adv/core-advanced-test.js | grep DELTA
You will see output like this
DELTA 3204 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204] 100
DELTA 2083 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083] 165
DELTA 2016 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016] 228
DELTA 1932 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016 1932] 288
The first number is the number of milliseconds to run that test loop, the queue is the last 32 timings, and the very last number is the average of the 32 timings. (So, definitely let it fill the queue with timings, and go well beyond that to let things settle completely.)
(If you run them in Node, the tests want print
to be defined, so to do that, start node then var print = console.log;
and then require("./path/to/core-advanced-test.js");
Hi Mike,
thanks; I've also modified the test and can confirm we are ~5-6x slower than Node/v8 even with sufficient warmup. We will look into improving our score on this.
Best, Christian
Thanks @wirthi. Likewise, if you happen to find any particular code constructs that ClojureScript (and really, Google Closure Compiler) are emitting that are problematic from a perf perspective, that would be interesting to know as well.
Hi Mike,
short status update:
We've identified and fixed several minor, yet relevant performance issues on that benchmark. They are in tip of this repo (a few more are planned in will be merged in the next week). Some changes landed in RC5, that is about to be published, but most didn't make the feature freeze and will only show up in RC6 a month later.
The current state (tip of the repo as of today):
Node 8.11.1:
DELTA 933 #queue [0 2152 1189 1076 1000 1081 935 1018 939 971 1043 1015 965 985 975 1017 1036 995 1045 931 1040 961 987 1018 960 1057 965 1046 967 1029 1056 933] 1012
(not getting any faster after that)
Graal.js EE:
DELTA 1413 #queue [1900 1626 1585 1538 1704 1631 1786 1754 1595 1543 1620 1535 1492 1380 1359 1321 1382 1324 1494 1275 1281 1405 1391 1321 1295 1334 1373 1419 1375 1398 1364 1413] 1475
(that's the 100th iteration, the 100th time "DELTA" is print).
Warmup is also improved, that's after the 31st iteration:
DELTA 1736 #queue [0 54930 26346 14820 5414 4052 3506 3327 3143 2879 3027 2415 2327 2433 2355 2279 2380 2529 2308 2173 2031 2011 2031 2142 2089 2015 1915 1881 1727 1676 1632 1736] 5172
A measurement in the area of the peak performance is reached around 40 iterations. Values are a bit flaky, on my Laptop, should be using a dedicated benchmark machine.
nashorn 1.8.0_172
100th iteration:
DELTA 5641 #queue [5409 5472 5371 5585 5777 5573 5622 5647 5816 5717 5531 5690 5738 5601 5590 5660 5682 5703 5654 5589 5626 5736 5695 5684 6124 5976 5646 5850 6014 5476 5762 5641] 5676
warmup (31st iteration):
DELTA 6398 #queue [0 85096 18803 12555 10031 9503 8621 8315 8034 7741 7560 7352 8410 8073 8237 7887 8056 8881 8038 7562 8589 7861 7924 7988 8979 7893 7977 7465 6602 6779 6669 6398] 10621
slower first iteration, faster on iterations #2 and #3, from them on Graal.js is faster.
We'll have a version of this benchmark now on our server, will continue to monitor performance there.
-- Christian
Wow! If you assume that Node 8.11.1's performance is equivalent to the V8 number I had obtained, and extrapolating, you'd get this updated chart with Graal.JS matching SpiderMonkey's perf.
For reference, here are the original numbers behind the chart that I had initially attached to this issue:
Engine | ms |
---|---|
JavaScriptCore | 723 |
V8 | 787 |
SpiderMonkey | 1148 |
ChakraCore | 1704 |
Nashorn | 2961 |
Graal.JS | 4464 |
And based on @wirthi 's report above, extrapolating the new Graal.JS perf on tip puts it at 1147 ms. (You get a similar estimate of 1160 ms if you come at it using the Nashorn numbers as a baseline.)
Here is a chart comparing Nashorn and Graal.JS master warmup curves. x-axis is iteration and y-axis is the number of milliseconds needed to run the ClojureScript unit tests.
This is a good first step. Don't think we should be satisfied before leading this table though ;-).
@thomaswue For these tests, Graal.JS is currently within the golden ratio of JavaScriptCore perf, so it seems within reach. 👍
@mfikes I've done some further benchmarks and figured we have quite a performance difference between the JVM and our own SubstrateVM. When you run with the js
command, the code is executed by a native image that is ahead-of-time compiled via SubstrateVM. This is great for embedding but has different performance characteristics: startup is much faster, warmup might be better, peak performance might be lower or higher depending on the workload.
This different characteristics affect your benchmark. I measured with js --jvm
now - that configuration uses Graal as Java compiler as well, but otherwise runs on a normal JVM, without Substrate VM. Then our results are already better on RC4 that you tested on:
RC4 4067
RC4 --jvm 2200
RC5 3672
RC5 --jvm 1824
RC6dev --jvm 1486
You can see the improvement over the release candidates (jvm mode) 2200 => 1824 => 1486, but also that the jvm mode is around 2x faster than the native mode. Our initial investigations suggest that the Garbage Collector is the reason for the native image being slower; SubstrateVM has a less sophisticated one (compared to HotSpot) and the benchmark seems to be heavy on allocating and thus GC.
As Thomas mentioned above, we will keep improving on this benchmark, this just adds another dimension we have to consider.
@mfikes, looks like 5 of the 413 tests fail after the first iteration. Should that be of concern?
@woess Yeah, as a separate concern, the tests should pass. In ClojureScript's CI we currently grab graalvm-ce-1.0.0-rc1 and run the tests. They currently all pass.
@woess Sorry, adding to the above, I missed that you said after the first iteration. I suspect that this is OK: The tests probably change some state and then fail when that state is dirty.
Here is an updated chart showing perf with RC8:
Underlying data:
Engine | ms |
---|---|
CE | 1763 |
CE --jvm | 982 |
EE | 1719 |
EE --jvm | 828 |
JavaScriptCore | 723 |
V8 | 787 |
SpiderMonkey | 1148 |
ChakraCore | 1704 |
CE = Community edition, EE = Enterprise edition, in case someone else is wondering.
Updated Chart for RC12:
Underlying data:
Engine | ms |
---|---|
CE | 1894 |
CE --jvm | 955 |
EE | 1402 |
EE --jvm | 917 |
JavaScriptCore | 723 |
V8 | 787 |
SpiderMonkey | 1148 |
ChakraCore | 1704 |
One big difference is that it appears substrate VM's garbage collection may have improved substantially under EE.
Updated Chart for the 19.0.0 production release:
Underlying data:
Engine | ms |
---|---|
CE | 2979 |
CE --jvm | 1919 |
EE | 2672 |
EE --jvm | 1655 |
JavaScriptCore | 723 |
V8 | 787 |
SpiderMonkey | 1148 |
ChakraCore | 1704 |
It appears there was an approximately 0.55 speedup relative to RC 12.
@mfikes isn't it a 0.55 slowdown relative to RC12?
@dm3 A speedup of less than 1.0 is essentially a slowdown. See, for example https://en.wikipedia.org/wiki/Speedup#Using_execution_times
Perhaps you could call it a 1.8 slowdown.
Sorry, this is definitely the right description technically. I expect more people to be tripped up by the positive (speedup) + negative (0.55 < 1) though.
Hi @mfikes
thanks for the update, we'll investigate.
Best, Christian