typescript-go icon indicating copy to clipboard operation
typescript-go copied to clipboard

Module resolution caching mechanism slows down parsing phase a lot

Open auvred opened this issue 8 months ago • 4 comments

I noticed that the CPU is not fully utilized during the parsing phase. Tracing shows that there is a lot of unused CPU time:

Image

It turns out that a lot of blocking is happening in resolver's ResolveModuleName:

Image

Mitigation

I've removed all requests to perDirectoryResolutionCache and nonRelativeNameResolutionCache of the moduleNameCache. Thus, each import is resolved anew.

To avoid spamming OS with duplicated syscalls, I added cache (based on sync.Map) for func (*vfs.Common) Stat and func (*vfs.Common) getEntries.

Here are a few measurements:

Each binary exits immediately after compiler.NewProgram() call, so the binary's execution time is approximately equal to the compiler.NewProgram execution time.

tsgo-main - built from current main branch tsgo-without-fs-cache - without resolver cache, without VFS cache tsgo-fs-cache - without resolver cache, with VFS cache

I am running this experiment on microsoft/vscode source code.

Go GC is disabled via GOGC=off.

The full command looks like GOMAXPROCS=X GOGC=off hyperfine --warmup 2 --runs 10 './tsgo-main' './tsgo-without-fs-cache' './tsgo-fs-cache'

GNU/Linux (8 cores, 16 threads)

GOMAXPROCS=16 (1.92x speedup)
Command Mean [s] Min [s] Max [s] Relative
tsgo-fs-cache 0.555 ± 0.007 0.543 0.572 1.00
tsgo-main 1.063 ± 0.007 1.055 1.075 1.92 ± 0.03
tsgo-without-fs-cache 0.630 ± 0.004 0.621 0.636 1.13 ± 0.02
GOMAXPROCS=4 (1.45 speedup)
Command Mean [s] Min [s] Max [s] Relative
tsgo-fs-cache 0.942 ± 0.012 0.929 0.963 1.00
tsgo-main 1.365 ± 0.024 1.338 1.413 1.45 ± 0.03
tsgo-without-fs-cache 1.188 ± 0.008 1.175 1.203 1.26 ± 0.02

Windows (VM - 14 virtual processors)

GOMAXPROCS=14 (1.99x speedup)
Command Mean [s] Min [s] Max [s] Relative
tsgo-fs-cache.exe 0.931 ± 0.018 0.894 0.955 1.00
tsgo-main.exe 1.854 ± 0.033 1.805 1.900 1.99 ± 0.05
tsgo-without-fs-cache.exe 2.337 ± 0.162 2.252 2.793 2.51 ± 0.18
GOMAXPROCS=4 (1.49x speedup)
Command Mean [s] Min [s] Max [s] Relative
tsgo-fs-cache.exe 1.499 ± 0.012 1.486 1.526 1.00
tsgo-main.exe 2.237 ± 0.048 2.156 2.278 1.49 ± 0.03
tsgo-without-fs-cache.exe 2.502 ± 0.016 2.475 2.523 1.67 ± 0.02

The tsgo-fs-cache trace is as follows (the time range is the same as in the first image - 1 sec):

Image


So without module resolution caching, parsing runs up to 2x faster on my machine.

Does it make sense to use moduleNameCache then? Maybe it worth removing it to reduce the number of blocking calls?

auvred avatar Mar 19 '25 13:03 auvred

@andrewbranch

jakebailey avatar Mar 19 '25 14:03 jakebailey

We should test on a range of different projects and on lots of different hardware. I’m not sure it will be better on a 2-core CI machine.

andrewbranch avatar Mar 19 '25 16:03 andrewbranch

It works faster on small amount of procs too

GOMAXPROCS=1 (1.06x):

Image

GOMAXPROCS=2 (1.27x):

Image

GOMAXPROCS=3 (1.36x):

Image

But yes, I agree that it should be more thoroughly tested. Also I don't know how it would work on macOS.

auvred avatar Mar 19 '25 16:03 auvred

FWIW setting GOMAXPROCS is not a totally sound way to see how a process behaves with fewer cores; there are other factors here that affect this than just how many threads the Go runtime spawns.

(I discovered this the hard way when attempting to performance test tsgo with fewer cores; setting GOMAXPROCS lower made performance scale far worse than going to an actual machine with fewer cores.)

That isn't to say that your scaling is wrong, of course, just something to be aware of.

jakebailey avatar Mar 19 '25 16:03 jakebailey