llrt icon indicating copy to clipboard operation
llrt copied to clipboard

Rerun the benchmark?

Open perpil opened this issue 3 months ago • 2 comments

Is it possible to rerun the benchmark with the latest release? Since January 2024, the llrt binaries have grown 2+ MB. I've been doing some coldstart benchmarking to compare LLRT standard sdk with Node 22. My test using arm in us-east-2 instantiates a STS client and invokes get-caller-identity. It isn't quite the same as a ddb put, but it's similar enough that I'd expect the http and λ times to be close to yours on coldstarts. In the table below http means time recorded from the client and λ is time from the invocation logs like yours. Overhead means http - λ. I'm noticing that the overhead in the cold start is higher for LLRT than when using the Node 22 runtime by about 50 ms. I also notice that my p50 for http is ~40 ms higher than your benchmark, but λ is ~17 ms higher. I suspect the increase in binary size is the main contributor, but if you can rerun the benchmarks that will help narrow down what is causing the deltas.

Test memory samples p0(http) p50(http) p99(http) p50(overhead) p0(λ) p50(λ) p99(λ)
LLRT v.0.7.0-beta 128 140 186 275 357 193 61 81 128
Node 22 AwsLite 512 140 344 409 492 144 208 260 326
Node 22 v3 SDK w/optimizations 512 140 369 447 543 144 248 300 363
Node 22 v3 SDK minified 512 140 460 547 640 151 320 393 464
Node 22 v3 SDK from disk 512 140 557 681 788 145 444 535 633

perpil avatar Sep 25 '25 20:09 perpil

Hi @perpil. Sure thing, we should rerun the benchmark. And you are also right that the overall increased cold start time is due to binary size due to disk read performance (and decompression) on Lambda.

I also think you should benchmark with similar memory size. In your current test, Node has 4x more memory which also means 4x more CPU (and CPU time) which makes it very biased.

Also get-caller-identity for STS is quite slow which also means that a lot of duration (for both Node and LLRT is spent waiting) that affects the test. For example say there is 100ms latency for the call and LLRT finnishes in 125 and node 200. On paper this makes LLRT 60% faster (200/100), but it's actually 4x faster (100/25).

Without a closer look at the benchmark bundle files on Node I also suspect that the overheard comes from shipping pure JS Code for Node vs with LLRT shipping the executable + JS Code.

That being said, a lot of the recent size increases in LLRT comes from the WebCrypto APIs that are in pure Rust. We're working on building variants of LLRT that uses static and runtime loaded OpenSSL instead. This is significantly shrink the binary size and only introduce a cold start penalty if you actually use WebCrypto APIs which most use cases don't. Additionally that increased cold start time is offset by increased performance of OpenSSL and libcrypto vs pure rust crypto implementations. The downside of this approach is that it requires OpenSSL/libcrypto to exists (which it does in Lambda). That's why different versions of LLRT will use different Crypto Providers, where the Lambda version will dynamically load OpenSSL symbols at runtime on demand when this feature is completed.

richarddavison avatar Sep 30 '25 09:09 richarddavison

Thanks @richarddavison all good points.

I also think you should benchmark with similar memory size. In your current test, Node has 4x more memory which also means 4x more CPU (and CPU time) which makes it very biased.

I'm mainly trying to compare v.0.7.0-beta to the current llrt benchmark in the readme which is using 128 MB. I'm making the call to STS in init (vs. the handler like your benchmark) so CPU should be the 1 full vCPU regardless of memory size. Agreed that I should use the same amount of memory between llrt and node if I wanted an apples to apples comparison.

Also get-caller-identity for STS is quite slow which also means that a lot of duration (for both Node and LLRT is spent waiting) that affects the test. For example say there is 100ms latency for the call and LLRT finnishes in 125 and node 200. On paper this makes LLRT 60% faster (200/100), but it's actually 4x faster (100/25).

I don't think get-caller-identity is a particularly heavy operation, but this is a valid point. As I understand, it shouldn't be much greater than the auth latency of any authed AWS call. The maxday website is down at the moment so I can't see p50 time between my benchmark and raw llrt, but what I recall is api's E2E latency is on the order of 10-15 ms including ssl.

Without a closer look at the benchmark bundle files on Node I also suspect that the overheard comes from shipping pure JS Code for Node vs with LLRT shipping the executable + JS Code.

I suspect the same. I think I'm seeing 4.5MB package size for llrt vs 77KB for bundled node.

perpil avatar Sep 30 '25 16:09 perpil