TypeScript SDK performance improvement
This issue aims to regroup all my tests and dig into the SDK performances improvement.
Init
- Env: compiled a binary from main: (bdce95d0abfa9014a35f5e3962a64afce03d5fca)
- Use dagger cli
v0.11.0 - Use latest dagger cloud
time dagger init --name=perf --sdk=typescript
Initialized module perf in .
dagger init --name=perf --sdk=typescript 0.98s user 0.73s system 11% cpu 14.829 total
➜ perfs git:(main) ✗ time dagger functions
Name Description
container-echo Returns a container that echoes whatever string argument is provided
grep-dir Returns lines that match a pattern in the files of the provided Directory
dagger functions 1.32s user 0.78s system 10% cpu 20.765 total
➜ perfs git:(main) ✗ time dagger call container-echo --string-arg "dig into perf" stdout
dig into perf
dagger call container-echo --string-arg "dig into perf" stdout 0.92s user 0.69s system 18% cpu 8.599 total
| Operation | Time | Cloud URL (by sha) | Go time (as ref) |
| Init | 14.8 | 1951a90ea8679fa505b67f640c4a7ce7 | 2.6s |
| Functions | 20s | 7cf74155070bf75952fa5164e3e9667c | 3.12s |
| Call container-echo | 8.6s | da94c06b08ba15704f815667a440e44d | 2.89s |
It's obvious that the Go SDK is much faster than TypeScript. Now that we have the context, let's understand why.
Compares traces
If we compare the TypeScript init trace and the Go init trace.
We can observe a huge difference in the initialization, if we expends the traces, we find something interesting.
The Go withDirectory operation takes less than a second
The Typescript SDK takes 5.2s
This is where we need to optimize the setup.
Note
We also have an extra step where we download the node image, but the speed will be dependant of the network so we cannot really optimize it.
By @jedevc in https://github.com/dagger/dagger/pull/7081#issuecomment-2056371841
Just dumping here for potential avenues for exploration (duplicating from discord):
npm install --package-lock-only step seems to take a lot of time - is there a faster package manager we could use here? I think this is likely only generating the package-lock.json, so I'm not sure what makes this so expensive. Or is this approach even right? It feels like an issue that we wouldn't respect the existing package-lock.json, cc @helderco, I know you looked at the python equivalent in https://github.com/dagger/dagger/pull/7064. We should actually split out these commands to be separate if possible! There's a lot of stuff being done in groups of commands, so traces only show for that chunk, making it hard to dive deeper. Caching doesn't seem to always cache between running init and functions. Not quite sure why - this should be pretty instantaneous. I suspect something in TSX is introducing some latency - is there a way to cache TSX at all? Or maybe we could consider not using TSX at all, and instead compile the typescript into javascript? That feels like it would cache much better potentially, and could move some of the costs upfront.
In the setup, we have this sets of operations that additioned takes more than 3 seconds, what are theses?
We also have 2 seconds dedicated to install tsx
This is something we could improve by changing the package manager (to pnpm maybe?)
This is something we could improve by changing the package manager (to pnpm maybe?)
Yeah, I similarly changed Python's default installer by using a faster one:
- https://github.com/dagger/dagger/pull/6884
I did some test, trying to switch to pnpm but I keep hitting issues with graphql: https://github.com/pnpm/pnpm/issues/1715
I'm trying another strategie first, seeing if I can get rid of these shell script to do it with dagger operations, maybe it can impact the cache?
I can see that with cached operations, it's going pretty fast so we can do 2 things:
- Simplify the caching (the longest operation is,
npm install ./sdkwhich is something that can easily cached) - Reduce the installation time (pre-install the sdk or post install it for example)
I made some tests in order to reduce the time of the dependency installation, it seems I can slightly reduce the time by only installing production dependencies.
I really want to use pnpm, I'll continue to dig into the dependency resolutions, starting by using yarn which may also increase the time.
Note The time are much slower because I changed of place and my internet connection is slower.
It seems I can really decrease the time using yarn and I'm not hitting an issue (almost by 2 globally)
Based on this benchmark, yarn is faster in cold start so it might be a good first solution..
For the latest update about SDK performance improvement, see: https://github.com/dagger/dagger/pull/7096#issuecomment-2123537679
We discuss about couple of improvement that could be done on the TS SDK to improve runtime performances:
- Store the result of the scan into a file during the registration so we don't need to scan it again on the execution -> just load the scanned filed. This would reduce the time of execution because we will not need to recompile the source code etc analysis.
- Use a Rust or Go library to get the project AST, it's a path I need to explore but it might be a way to also improve performances
Quick update on that one, https://github.com/dagger/dagger/pull/7864 could unlock a possible optimization on the setup by removing the sdk install part, it seems that by only using the lockfile, we can download every dependencies.
So we would remove half the download work on the setup, which can potentially lead to great improvement.
The support for multiple package manager will also help measuring performances. I'll probably go back on that one soon.