graaljs icon indicating copy to clipboard operation
graaljs copied to clipboard

Perf of warmed-up ClojureScript unit tests slow on Graal.JS relative to Nashorn

Open mfikes opened this issue 6 years ago • 18 comments

If you run the ClojureScript compiler's unit tests in a loop and allow things to warm up, they end up running more slowly in Graal.JS than under Nashorn.

dikeicwvaaiua-9

The numbers above were produced on a Mac, using GraalVM 1.0.0-rc4 (ee edition). Nashorn is 1.8.0_172-b11.

To reproduce using the actual ClojureScript compiler unit tests, the steps involve:

  1. Modify the ClojureScript compiler's tests so that they run in an infinite loop (optionally also capturing timing metrics)
  2. Build the tests (which involves producing a JavaScript file that can be executed in any engine)
  3. Run the tests, allowing some time (say 1/2 hour or so) for them to settle

To make things more convenient, I've attached a copy of the built tests, and attached them directly in the "Run the tests" section.

Modify the ClojureScript compiler's tests

The ClojureScript compiler is at https://github.com/clojure/clojurescript

Modifications are at https://github.com/mfikes/clojurescript/commit/1c76cf62b1c69e0a81f3663559215c62fe09ae31 and to make things easier, that branch could just be checked out and used as-is. Roughly, these modifications involve setting up some atoms (mutable state) to track timing information, and putting the tests themselves in an infinite loop ((while true ...) in ClojureScript), doing some accounting for timing, and printing out the timing with the string DELTA which can be grepped for.

Build the tests

To build the tests, run script/bootstrap and then script/test in the tree. This will cause it to build them and then after logging

Applying optimizations :advanced to 343 sources

it will automatically attempt to run them under various JavaScript engines if any are configured per https://clojurescript.org/community/running-tests. (No JavaScript engines need to be configured for this repro). If it starts running the tests in any engine, Ctrl-C them (because they are in an infinite loop).

This will result in a JavaScript file, builds/out-adv/core-advanced-test.js that can be run in any engine.

Run the tests

For convenience, here is a built copy of the test JavaScript file: core-advanced-test.js.gz

Then to run the tests, say, using Graal.JS, do

/path/to/js builds/out-adv/core-advanced-test.js | grep DELTA

You will see output like this

DELTA 3204 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204] 100
DELTA 2083 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083] 165
DELTA 2016 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016] 228
DELTA 1932 #queue [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3204 2083 2016 1932] 288

The first number is the number of milliseconds to run that test loop, the queue is the last 32 timings, and the very last number is the average of the 32 timings. (So, definitely let it fill the queue with timings, and go well beyond that to let things settle completely.)

(If you run them in Node, the tests want print to be defined, so to do that, start node then var print = console.log; and then require("./path/to/core-advanced-test.js");

mfikes avatar Jul 20 '18 18:07 mfikes

Hi Mike,

thanks; I've also modified the test and can confirm we are ~5-6x slower than Node/v8 even with sufficient warmup. We will look into improving our score on this.

Best, Christian

wirthi avatar Jul 23 '18 11:07 wirthi

Thanks @wirthi. Likewise, if you happen to find any particular code constructs that ClojureScript (and really, Google Closure Compiler) are emitting that are problematic from a perf perspective, that would be interesting to know as well.

mfikes avatar Jul 23 '18 18:07 mfikes

Hi Mike,

short status update:

We've identified and fixed several minor, yet relevant performance issues on that benchmark. They are in tip of this repo (a few more are planned in will be merged in the next week). Some changes landed in RC5, that is about to be published, but most didn't make the feature freeze and will only show up in RC6 a month later.

The current state (tip of the repo as of today):

Node 8.11.1:

DELTA 933 #queue [0 2152 1189 1076 1000 1081 935 1018 939 971 1043 1015 965 985 975 1017 1036 995 1045 931 1040 961 987 1018 960 1057 965 1046 967 1029 1056 933] 1012

(not getting any faster after that)

Graal.js EE:

DELTA 1413 #queue [1900 1626 1585 1538 1704 1631 1786 1754 1595 1543 1620 1535 1492 1380 1359 1321 1382 1324 1494 1275 1281 1405 1391 1321 1295 1334 1373 1419 1375 1398 1364 1413] 1475

(that's the 100th iteration, the 100th time "DELTA" is print).

Warmup is also improved, that's after the 31st iteration:

DELTA 1736 #queue [0 54930 26346 14820 5414 4052 3506 3327 3143 2879 3027 2415 2327 2433 2355 2279 2380 2529 2308 2173 2031 2011 2031 2142 2089 2015 1915 1881 1727 1676 1632 1736] 5172

A measurement in the area of the peak performance is reached around 40 iterations. Values are a bit flaky, on my Laptop, should be using a dedicated benchmark machine.

nashorn 1.8.0_172

100th iteration:

DELTA 5641 #queue [5409 5472 5371 5585 5777 5573 5622 5647 5816 5717 5531 5690 5738 5601 5590 5660 5682 5703 5654 5589 5626 5736 5695 5684 6124 5976 5646 5850 6014 5476 5762 5641] 5676

warmup (31st iteration):

DELTA 6398 #queue [0 85096 18803 12555 10031 9503 8621 8315 8034 7741 7560 7352 8410 8073 8237 7887 8056 8881 8038 7562 8589 7861 7924 7988 8979 7893 7977 7465 6602 6779 6669 6398] 10621

slower first iteration, faster on iterations #2 and #3, from them on Graal.js is faster.

We'll have a version of this benchmark now on our server, will continue to monitor performance there.

-- Christian

wirthi avatar Jul 31 '18 13:07 wirthi

Wow! If you assume that Node 8.11.1's performance is equivalent to the V8 number I had obtained, and extrapolating, you'd get this updated chart with Graal.JS matching SpiderMonkey's perf.

updated

For reference, here are the original numbers behind the chart that I had initially attached to this issue:

Engine ms
JavaScriptCore 723
V8 787
SpiderMonkey 1148
ChakraCore 1704
Nashorn 2961
Graal.JS 4464

And based on @wirthi 's report above, extrapolating the new Graal.JS perf on tip puts it at 1147 ms. (You get a similar estimate of 1160 ms if you come at it using the Nashorn numbers as a baseline.)

mfikes avatar Jul 31 '18 19:07 mfikes

Here is a chart comparing Nashorn and Graal.JS master warmup curves. x-axis is iteration and y-axis is the number of milliseconds needed to run the ClojureScript unit tests.

warmup

mfikes avatar Jul 31 '18 19:07 mfikes

This is a good first step. Don't think we should be satisfied before leading this table though ;-).

thomaswue avatar Jul 31 '18 19:07 thomaswue

@thomaswue For these tests, Graal.JS is currently within the golden ratio of JavaScriptCore perf, so it seems within reach. 👍

mfikes avatar Jul 31 '18 19:07 mfikes

@mfikes I've done some further benchmarks and figured we have quite a performance difference between the JVM and our own SubstrateVM. When you run with the js command, the code is executed by a native image that is ahead-of-time compiled via SubstrateVM. This is great for embedding but has different performance characteristics: startup is much faster, warmup might be better, peak performance might be lower or higher depending on the workload.

This different characteristics affect your benchmark. I measured with js --jvm now - that configuration uses Graal as Java compiler as well, but otherwise runs on a normal JVM, without Substrate VM. Then our results are already better on RC4 that you tested on:

RC4          4067
RC4 --jvm    2200

RC5          3672
RC5 --jvm    1824

RC6dev --jvm 1486

You can see the improvement over the release candidates (jvm mode) 2200 => 1824 => 1486, but also that the jvm mode is around 2x faster than the native mode. Our initial investigations suggest that the Garbage Collector is the reason for the native image being slower; SubstrateVM has a less sophisticated one (compared to HotSpot) and the benchmark seems to be heavy on allocating and thus GC.

As Thomas mentioned above, we will keep improving on this benchmark, this just adds another dimension we have to consider.

wirthi avatar Aug 01 '18 09:08 wirthi

@mfikes, looks like 5 of the 413 tests fail after the first iteration. Should that be of concern?

woess avatar Aug 01 '18 11:08 woess

@woess Yeah, as a separate concern, the tests should pass. In ClojureScript's CI we currently grab graalvm-ce-1.0.0-rc1 and run the tests. They currently all pass.

@woess Sorry, adding to the above, I missed that you said after the first iteration. I suspect that this is OK: The tests probably change some state and then fail when that state is dirty.

mfikes avatar Aug 01 '18 14:08 mfikes

Here is an updated chart showing perf with RC8:

perf

Underlying data:

Engine ms
CE 1763
CE --jvm 982
EE 1719
EE --jvm 828
JavaScriptCore 723
V8 787
SpiderMonkey 1148
ChakraCore 1704

mfikes avatar Oct 23 '18 14:10 mfikes

CE = Community edition, EE = Enterprise edition, in case someone else is wondering.

Biserkov avatar Oct 25 '18 13:10 Biserkov

Updated Chart for RC12:

image

Underlying data:

Engine ms
CE 1894
CE --jvm 955
EE 1402
EE --jvm 917
JavaScriptCore 723
V8 787
SpiderMonkey 1148
ChakraCore 1704

One big difference is that it appears substrate VM's garbage collection may have improved substantially under EE.

mfikes avatar Feb 11 '19 01:02 mfikes

Updated Chart for the 19.0.0 production release:

image

Underlying data:

Engine ms
CE 2979
CE --jvm 1919
EE 2672
EE --jvm 1655
JavaScriptCore 723
V8 787
SpiderMonkey 1148
ChakraCore 1704

It appears there was an approximately 0.55 speedup relative to RC 12.

mfikes avatar May 10 '19 01:05 mfikes

@mfikes isn't it a 0.55 slowdown relative to RC12?

dm3 avatar May 10 '19 12:05 dm3

@dm3 A speedup of less than 1.0 is essentially a slowdown. See, for example https://en.wikipedia.org/wiki/Speedup#Using_execution_times

Perhaps you could call it a 1.8 slowdown.

mfikes avatar May 10 '19 13:05 mfikes

Sorry, this is definitely the right description technically. I expect more people to be tripped up by the positive (speedup) + negative (0.55 < 1) though.

dm3 avatar May 10 '19 14:05 dm3

Hi @mfikes

thanks for the update, we'll investigate.

Best, Christian

wirthi avatar May 10 '19 15:05 wirthi