typescript-go
typescript-go copied to clipboard
Module resolution caching mechanism slows down parsing phase a lot
I noticed that the CPU is not fully utilized during the parsing phase. Tracing shows that there is a lot of unused CPU time:
It turns out that a lot of blocking is happening in resolver's ResolveModuleName:
Mitigation
I've removed all requests to perDirectoryResolutionCache and nonRelativeNameResolutionCache of the moduleNameCache. Thus, each import is resolved anew.
To avoid spamming OS with duplicated syscalls, I added cache (based on sync.Map) for func (*vfs.Common) Stat and func (*vfs.Common) getEntries.
Here are a few measurements:
Each binary exits immediately after compiler.NewProgram() call, so the binary's execution time is approximately equal to the compiler.NewProgram execution time.
tsgo-main- built from currentmainbranchtsgo-without-fs-cache- without resolver cache, without VFS cachetsgo-fs-cache- without resolver cache, with VFS cacheI am running this experiment on
microsoft/vscodesource code.Go GC is disabled via
GOGC=off.The full command looks like
GOMAXPROCS=X GOGC=off hyperfine --warmup 2 --runs 10 './tsgo-main' './tsgo-without-fs-cache' './tsgo-fs-cache'
GNU/Linux (8 cores, 16 threads)
GOMAXPROCS=16 (1.92x speedup)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
tsgo-fs-cache |
0.555 ± 0.007 | 0.543 | 0.572 | 1.00 |
tsgo-main |
1.063 ± 0.007 | 1.055 | 1.075 | 1.92 ± 0.03 |
tsgo-without-fs-cache |
0.630 ± 0.004 | 0.621 | 0.636 | 1.13 ± 0.02 |
GOMAXPROCS=4 (1.45 speedup)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
tsgo-fs-cache |
0.942 ± 0.012 | 0.929 | 0.963 | 1.00 |
tsgo-main |
1.365 ± 0.024 | 1.338 | 1.413 | 1.45 ± 0.03 |
tsgo-without-fs-cache |
1.188 ± 0.008 | 1.175 | 1.203 | 1.26 ± 0.02 |
Windows (VM - 14 virtual processors)
GOMAXPROCS=14 (1.99x speedup)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
tsgo-fs-cache.exe |
0.931 ± 0.018 | 0.894 | 0.955 | 1.00 |
tsgo-main.exe |
1.854 ± 0.033 | 1.805 | 1.900 | 1.99 ± 0.05 |
tsgo-without-fs-cache.exe |
2.337 ± 0.162 | 2.252 | 2.793 | 2.51 ± 0.18 |
GOMAXPROCS=4 (1.49x speedup)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
tsgo-fs-cache.exe |
1.499 ± 0.012 | 1.486 | 1.526 | 1.00 |
tsgo-main.exe |
2.237 ± 0.048 | 2.156 | 2.278 | 1.49 ± 0.03 |
tsgo-without-fs-cache.exe |
2.502 ± 0.016 | 2.475 | 2.523 | 1.67 ± 0.02 |
The tsgo-fs-cache trace is as follows (the time range is the same as in the first image - 1 sec):
So without module resolution caching, parsing runs up to 2x faster on my machine.
Does it make sense to use moduleNameCache then? Maybe it worth removing it to reduce the number of blocking calls?
@andrewbranch
We should test on a range of different projects and on lots of different hardware. I’m not sure it will be better on a 2-core CI machine.
It works faster on small amount of procs too
GOMAXPROCS=1 (1.06x):
GOMAXPROCS=2 (1.27x):
GOMAXPROCS=3 (1.36x):
But yes, I agree that it should be more thoroughly tested. Also I don't know how it would work on macOS.
FWIW setting GOMAXPROCS is not a totally sound way to see how a process behaves with fewer cores; there are other factors here that affect this than just how many threads the Go runtime spawns.
(I discovered this the hard way when attempting to performance test tsgo with fewer cores; setting GOMAXPROCS lower made performance scale far worse than going to an actual machine with fewer cores.)
That isn't to say that your scaling is wrong, of course, just something to be aware of.