ecosystem-tests Add a test for measuring client instantiation time

https://github.com/prisma/prisma/issues/8178

Oct 26 '21 15:10 millsp

What's our approach to benchmarking on CI (and do we have one)? My concern about having this as a regular test is that it's going to be either extremely flaky or not extremely useful. Even in the referenced issue the reproduction took 4.5 seconds on the reporter's machine and 0.4 (actually like 0.36 when I increased the number of digits in .toFixed(1)) seconds on my machine.

Oct 26 '21 17:10 aqrln

cachegrind + valgrind are the way to benchmark things consistently across devices, not matter the CPU or the RAM on the machine. For instance, if you run cachegrind on [email protected] (on the repro example in the issue linked above):

$ valgrind --tool=cachegrind node src/index.js
==274519== I   refs:      1,954,879,183

If you re-run that, you get similar results:

$ valgrind --tool=cachegrind node src/index.js
==276009== I   refs:      1,950,136,208

And if you run this on the latest prisma 3:

$ valgrind --tool=cachegrind node src/index.js
==277339== I   refs:      994,343,618

And again on prisma 3, but with my CPU at 0.80GHz:

$ valgrind --tool=cachegrind node src/index.js
==277823== I   refs:      990,733,662

By looking at the total CPU instructions, it gives us a good way to tap into real performance on CI or any other device consistently. One case where I see this failing is on a machine with very limited RAM (lots of garbage collection). As you can see, there's a bit of jitter because of the Node.js runtime. Considering that, we could check that it never exceeds 1-2% jitter.

https://valgrind.org/docs/manual/cg-manual.html#cg-manual.overview

Folks at SQLite have been using this for years too https://www.sqlite.org/cpu.html

Oct 26 '21 22:10 millsp

This is amazing, and definitely the way to go 🚀 We might even want to profile other metrics in a similar fashion, like the number of IO operations or even syscalls in general (which would have been much more important than the number of instructions in the context of https://github.com/prisma/prisma/issues/8178).

benchmark things consistently across devices, not matter the CPU

FWIW, I don't think it will necessary work consistently across devices. Even if we factor out things like different Node.js versions, different compilers Node.js is built with, different libc versions etc, we can't factor out different CPU architectures: the numbers will be different and won't be comparable.

That said, those differences may be interesting themselves. For example, I get 15% lower number for Prisma 2.26

==698== I   refs:      1,662,717,422

and 5% lower number for Prisma 3.3

==1090== I   refs:      939,571,062

and... it's really strange? I mean, I would expect the total instructions count to be much higher for a RISC architecture with lots of loads and stores as separate instructions. One possible explanation is that, apparently, JavaScript code is much better optimized by V8 on ARM64 than on x86_64, or so it seems.

Oct 28 '21 08:10 aqrln

Hmmm, that's interesting. I definitely did not account for the CPU architecture, good catch! Luckily, that's something that is consistent on the default GH Actions.

Oct 28 '21 14:10 millsp