Slow TypeScript compilation with type checking compared to Node
Using the TypeScript compiler to compile TypeScript code in Graal is noticeably slow - it takes seconds to compile and type check code. I've forked graal-js-jdk11-maven-demo to run the demo using an example that loads all of the needed code (pulled out of the TypeScript NPM package) and programmatically invokes the TypeScript compiler. Even after warmup it's still taking around 2 seconds on my machine to type check and compile an extremely simple function. This commit has the sample and also includes another snippet that does not do any type checking. When this code is executed (just replace COMPILE_TS with COMPILE_TS_NO_TYPE_CHECK) it executes two orders of magnitude faster.
While type checking is expected to incur an additional cost, this runs significantly faster on Node.
One thing to mention specifically about the implementation that might be related - to be able to do the type checking, the compiler needs to load in some additional files that contain the ECMAScript spec. Rather than give filesystem access, we load these files in memory as strings (pulled out of libPack.js in the JAR) which might be related to the poor performance.
Hi @choumits
thanks for posting this and providing an executable example (that simplifies our job a lot!)
I can run your code and agree that its first iterations start up pretty slowly. You mention this is "after warmup" - well, it is after an initial 15 iterations of warmup, which is fine for the normal demo application we use (primes), but too few for your TypeScript example. When letting it run with 0 warmup, 200 measure iterations, this is the resulting performance (measured on my laptop, YMMV):
iteration: 18271 //1
iteration: 9872 //2
iteration: 7169 //3
...
iteration: 2528 //25
...
iteration: 2171 //50
...
iteration: 1468 //100
...
iteration: 1281 //125
...
iteration: 862 //150
...
iteration: 863 //175
...
iteration: 768 //200
You can see that this is a warmup problem - after 200 iterations, performance is much better than the initial one, e.g., after the first 25 iterations - we still get a factor of 3.3 faster from there. Our compiler takes longer than expected to compile the relevant code. We are working on improving warmup performance in future versions. In addition, we are not on par with V8 on Typescript peak performance even with perfect warmup. This is another area we are targeting, to improve peak performance.
Best, Christian
Hi,
some additional observations to those above:
Nashorn (from JDK 11.0.8) does not get faster than around 2300s per iteration on my machine. Basically the score GraalVM JavaScript has after 15 iterations, or ~3x slower than our peak.
Your measurements (and mine, above) are on GraalVM Community edition, 20.3.0 (i.e. the setup you get from our current "run on stock JDK example"). As this uses the GraalVM compiler from a JAR, even the compiler itself is normal Java code that needs to warm up itself. We are using a native-image version of that for our GraalVM builds, which warms up much faster. For the optimal scenario, there are three things you can optimize (potentially, not all are possible for your deployment scenario):
- use Enterprise Edition instead of Community Edition => better peak, potentially better warmup
- use GraalVM instead of Stock JDK+JARs from Maven => faster warmup of the compiler => better application warmup
- use latest version (21.0.0.2 instead of 20.3.0) => bug fixes, better performance
running with that, my score is as follows:
iteration: 10082 //iteration 1
iteration: 6028
iteration: 4454
iteration: 3490
iteration: 3191
iteration: 3056
iteration: 2840
...
iteration: 666
iteration: 669
iteration: 668
iteration: 669
iteration: 660 //iteration 200
...
iteration: 510
iteration: 521
iteration: 500
iteration: 525
iteration: 530 //iteration 300
So the initial warmup of the first few iterations is measurably faster; it still takes ~150 iterations to get below one second per iteration, but ultimately, the peak is another 15% faster (after 200 iterations). Note it goes even further down; should measure the above setup also with 300 iterations, but overall, it goes down to slightly above 500 on average.
Best, Christian
Thanks for the information, not all of those are going to be possible for me but I'll give what I can a try. Are there any JVM or Graal Context/Engine configuration flags to make compilation more aggressive so warming up the code paths doesn't take nearly as many iterations? It sounds like that is something that's being targeted to improve in the future, but I wonder if there's any specific configuration that I would have access to for manual optimization.
... flags to make compilation more aggressive so warming up the code paths doesn't take nearly as many iterations?
I believe this is not the root cause of the problem here. If you inspect the running threads, you will note that there are actually a number of Graal and Truffle threads compiling with full CPU utilization.
You can have a look at https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Options/#expert-engine-options and might be successful in optimizing some flags - most prominently, the number of compiler threads, but you might also try tweak inlining. Configuring the GC, and optimizing the amount of heap you provide, is also an option you could consider.
Overall, though, I don't believe a lot can be gained by this. The most can be gained by improving the compilation queue itself: our teams are working on optimizing the inlining strategy, the order of compilations, provide a multi-tier mode (i.e, a fast but not-so-perfect first tier, and later to compile relevant code again with more effort, but resulting in better optimized code), etc.
Best, Christian