graaljs
graaljs copied to clipboard
Node.js vs Graal.js Performance
Dude,
I came across GraalVM and have a glance at the JVM options part, and thought it is promising, But I found that the performance is much lower than latest Node.js, here is the result: https://github.com/weixingsun/perf_tuning_results/blob/master/Node.js%20vs.%20GraalVM
Any idea about the difference?
Hi @weixingsun
thanks for your question. I am trying to understand what your benchmark script (test_graal.sh
) is doing. It obviously does something, and terminates after ~130 seconds on my machine, but CPU utilization is <1% most of the time, so that does not look like a reasonable benchmark to me.
I can execute the application itself (node application.js
) and benchmark it with a tool like wrk
, that really stresses the fib calculation. With that I get the following numbers:
- GraalVM: 0.93 requests/sec
- Node.js (10.9.0): 0.80 requests/sec
On that benchmark, GraalVM even outperforms Node. But note that your fib calculation blocks the event loop, so you can only do one calculation at a time, serve only one request at a time (you usually want to avoid exactly that when using Node.js) - all requests are serialized and calculated one after the other. So you are measuring hardly any Node.js/express code - this benchmark almost exclusively measures core JavaScript via the fibonacci calculation (for a 30 sec benchmark, only 28 iterations are run through Node.js/express; the time is spent in the fibonacci
function - which is fine, if you want to measure pure Javascript core performance).
I am using wrk -t5 -c10 -d30s http://localhost:8080/fib
to measure (that's my typical Node.js benchmark setting; using 5 threads and 10 connections is actually overkill on this serialized benchmark, as stated above).
Can you please help me understand what you try to measure with the test_graal.sh
script? Maybe I am missing something.
Best, Christian
@wirthi thanks for your reply, I just want to saturate a certain core in my server. the main workload of 2 simple get methods: fib/fast as an iteration in parallel. By using this method, I can easily see how long time the 100 continuous iterations take.
which I can see they occupied 100% user cycles, which means I created a bottleneck on cpu3: [root@dr1 cpu_bond]# mpstat -P 3 3 3 Linux 3.10.0-862.11.6.el7.x86_64 (dr1) 11/15/2018 x86_64 (112 CPU)
07:27:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:27:46 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:27:49 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:27:52 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
oops, test_graal.sh is creating bottleneck on cpu 2, log above is for test_v8.sh
We have two execution modes, "native" (default) and "JVM" (see https://www.graalvm.org/docs/reference-manual/languages/js/ for more information). Setting jvm options switches to the JVM. Currently, fibonacci is significantly faster in native mode, try running without jvm options.
@woess Thanks for explaining the modes, but I got 186.197s after removing all the jvm options. what vm is underneath? Nashorn or GraalVM?
perf record gave me following stacktraces: Samples: 1K of event 'cycles:ppp', Event count (approx.): 53465728779 Overhead Command Shared Object Symbol 2.00% node perf-29614.map [.] 0x00007fd3e71580cb 1.55% node libpolyglot.so [.] com.oracle.truffle.js.nodes.function.FunctionBodyNode.execute(com.oracle.truffle.api.frame.VirtualFrame)java.lang.Object 1.42% node libpolyglot.so [.] com.oracle.svm.core.genscavenge.GCImpl.blackenBootImageRoots()void 1.21% node perf-29614.map [.] 0x00007fd3e7158242 1.20% node perf-29614.map [.] 0x00007fd3e71588ec 1.14% node perf-29614.map [.] 0x00007fd3e7158000 0.99% node perf-29614.map [.] 0x00007fd3e71583f0 0.93% node perf-29614.map [.] 0x00007fd3e715859d 0.92% node perf-29614.map [.] 0x00007fd3e7158007 0.82% pilerThread-156 libpolyglot.so [.] org.graalvm.collections.EconomicMapImpl.grow()void ......
I was curious as well, so I figured I could provide a real life benchmark, running a webpack build. This was entirely unscientific and the tests were only run once.
The results were surprising. Here are the relevant files: https://gist.github.com/EdwardDrapkin/d1b380787821462c5677323614f20146
The results wound up:
Node 11:
real 0m3.361s
user 0m4.747s
sys 0m0.396s
Graal native:
real 1m18.097s
user 2m51.988s
sys 0m13.533s
Graal JVM:
real 1m5.169s
user 5m21.155s
sys 0m4.549s
Graal JVM with --jvm.XX:+UseG1GC:
real 1m13.938s
user 6m37.463s
sys 0m4.333s
Hi @EdwardDrapkin
thanks for sharing your benchmark. I am no expert on webpack - I guess the modules/pp3/
is the actual thing you pack? You didn't provide that in your gist?
Note that, unlike the peak-performance benchmark weixingsun posted above, your's is heavy on startup - it's a one-time executed tool. If even original Node finishes that in 3 seconds, Graal-Node.js will have a hard time of keeping up with that. Graal-Node.js requires more time to JIT-compile the source code it gets. This makes it slower on workloads like npm
, webpack
or similar - anything that runs only for a short time, and only once. However, a factor of >20 as you experience it is more than we usually see.
If I could reproduce your run fully, I'd love to look into it and see if there is anything we can optimize for.
Best, Christian
I can't provide the actual source code we use at work, but it's a fairly straightforward React project. You'd get similar results if you copied any react project in there. I will note that I switched the TS language service in IntelliJ to use GraalVM instead of NodeJS, and while it's exceptionally painful for a good long while, after about an hour it feels faster but AFAIK there's no way to benchmark proprietary IntelliJ plugins.
Create a simple Nuxt.js project with selecting yarn package manager as default. And run yarn run dev
, you simply don't need any benchmark results. Graal is slower 3min or more for a simple build. For complex projects over 350 modules difference goes up to 10-15min just for build. This is not in an acceptable range to use this. Also it gives errors and cannot start.
Is startup performance not going to be considered? Having to run both graaljs and nodejs in parallel is going to be confusing. I thought the point of graal was to have 1 tool that does it all and have the interop?
Hi @hc-codersatlas
we are currently working on significant startup improvements by AOT-compiling larger parts of the Node.js codebase. This is a significant engineering effort though, so it takes a while.
Best, Christian
Hey, i'm also interested in it.
I've just measured node.js vs graalvm's node performance and the latter is 10-20x slower.
Any possible reason or optimizations turned off? I think i will be able to provide the sources for benchmarking or do proper benchmarking (for now just replaced calls of node
to graalvm's node
without any additional arguments).
graalvm-ee-19.1.1 node.js v8.9.0
Is this for startup or peak performance? Can you share the workload as suggested?
Hi, Thomas. Thanks for reply.
It's rough time of execution in millis of exactly the same code on node and graalvm's node (startup time excluded from measurement). It includes processing of stdin, parsing (string + regexp operations mostly), objects instantiating and calling object methods with some business logics.
I think i will be able to provide the code, will doublecheck it.
Just run run.sh
with node on PATH:
./run.sh
or graalvm node on PATH, eg:
PATH=/Users/asmirnov/Documents/dev/src/graalvm-ee-19.1.1/Contents/Home/bin:$PATH ./run.sh
It will clone required JS code, prepare data and run benchmarking, see actual execution time. Let me know if you need any assistance or find the rootcause
Hi @4ntoine
thanks for your code. I confirm we can execute it and measure performance.
Your benchmark does not consider warm-up. To mitigate that, you can put a loop around the core of your benchmark (lines 153ff in benchmark.js) and measure each iteration independently - however, that might not give exact results due to caching in the code. We are working on some micro-benchmarks to better measure the performance. But it seems we are within 2.5X of origin Node if you account for the warmup.
Also, note that running in JVM mode (node --jvm benchmark.js
) gives a better peak performance than in native mode.
We'll get back to you once we know more. Also, improving our warmup performance is high up our list, so that should get better over the next releases.
Best, Christian
Hey.
Thanks for the update.
caching in the code
Yup, there is some caching and i can modify it to avoid side effect of caching for better benchmarking.
But it seems we are within 2.5X of origin Node if you account for the warmup.
Does it mean you target to have 2.5x worse performance compared to Node?
Our target is to be at least comparable speed or better for any workload. This is a longterm target however and we aren't there yet for Node.js applications.
Running graal/bin/node yarn start
in React project is significantly slower, than with stock NodeJS.
It takes around 15minutes to start compiling and I didn't wait after that.
Stock node does that in around ~2 minutes.
I think it should maybe get documented that node-graalvm is not as optimized at present as it could be and that at present the startup time is higher and the performance is slower for none long-running processes. so that not everyone is shocked.
Agreed that we should put the information on startup into the documentation. On peak performance it is not so clear as there are also workloads where we are faster.
I note that there are cases where GraalVM Node.js performs slower than Node.js after many iterations of the same task, which does not appear to be caused by start up time. I created a repo that illustrates that GraalVM performs slower than Node.js at:
- Regex (1000 iterations of regex-redux task)
- JSONPath queries (1000 iterations using 2 different libraries)
- HTTP GET requests (10,000 iterations using lightweight library)
I hope you will find it useful: https://github.com/Ivan-Kouznetsov/graalvm-perf
@ivan you need to calculate that right NodeJS will most time be faster in that cases but when you replace Regex with the Regex from java and the JSONParser and Query Element with the Java one and the HTTPGet Method with that from Java you outperform NodeJS by Far.
@thomaswue
Our target is to be at least comparable speed or better for any workload. This is a longterm target however and we aren't there yet for Node.js applications.
Are we there at the moment? Any benchmarks/comparisons available? Thanks
@4ntoine the state is still the same everything that uses nodejs modules from node-graaljs is slower
if you use Only Java or Javascript it is faster.
Hi @Ivan-Kouznetsov
thanks for your benchmarks, they provide relevant insight! And they show the fundamental misconception, which is
after many iterations of the same task
1000 iterations of something is not "many" in the JIT world. As per your documentation (original) Node.js would need 0.120s for the full jsonpath-classic-benchmark.js. GraalVM is in the Java world - and there, it would take a few hundred milliseconds to even start the JVM, let alone execute the benchmark. Thanks to native-image, we can be faster on GraalVM, but the same basic principle still applies: we need to JIT-compile the code, and that won't fully happen within 120 milliseconds.
I've hacked some proper warmup into your benchmark, like this (e.g. for jsonpath-classic-benchmark.js
):
const jsonPath = require('./lib/jsonPath');
const n = process.argv[2] || 10000;
function test() {
const sampleObj = {name:"john",job:{title:"developer", payscale:3}};
var len=0;
for(let i=0;i<n;i++){
len += jsonPath(sampleObj,"$..name").toString().length;
len += jsonPath(sampleObj,"$..payscale").toString().length;
len += jsonPath(sampleObj,"$..age").toString().length;
}
return len;
}
var i=0;
while (true) {
var start = Date.now();
console.log(test());
console.log(++i+" = "+(Date.now()-start)+" ms");
}
Basically, I am executing your full benchmark repeatedly, and print out how long each iteration takes:
GraalVM EE 20.3.0
$ node jsonpath-classic-benchmark.js
100000
1 = 2485 ms
100000
2 = 2267 ms
100000
3 = 427 ms
100000
4 = 209 ms
100000
5 = 185 ms
100000
6 = 148 ms
100000
7 = 177 ms
100000
8 = 143 ms
100000
9 = 146 ms
compared to Node.js 12.18.0
$ ~/software/node-v12.18.0-linux-x64/bin/node jsonpath-classic-benchmark.js
100000
1 = 253 ms
100000
2 = 221 ms
100000
3 = 217 ms
100000
4 = 197 ms
100000
5 = 199 ms
100000
6 = 208 ms
100000
7 = 207 ms
100000
8 = 197 ms
100000
9 = 213 ms
Admitted, GraalVM's first 2 iterations are horrible. Iterations 3 and 4 are in the ballpark of V8. Starting with iteration 5, GraalVM is actually significantly (around 25%) faster than V8.
There's one more trick up our sleeve. In --jvm
mode, the first iterations are even slower, and it takes longer to reach a good score. But after ~20 iterations, we are down to around 60ms per iterations, meaning GraalVM in JVM mode takes 0.3x the time of V8 per iteration.
On the jsonpath-new-benchmark.js
, GraalVM and V8 are roughly on par.
On regexp-benchmark.js
, our engine is around 3-4x behind. Will complain with our RegExp guy to optimize this pattern :-)
Best, Christian
@wirthi https://github.com/oracle/graaljs/issues/360#issuecomment-1129109834 maybe makes this obsolet as this performance degradations are now less a problem then before. Even npm is now not freezing anymore the string update is a hugh one combined with the new default boot mode
It would be cool if the jvm cached the generated binary from each class so that the warmup only would happen once and not on each program restart.