typescript-go Module resolution caching mechanism slows down parsing phase a lot

I noticed that the CPU is not fully utilized during the parsing phase. Tracing shows that there is a lot of unused CPU time:

It turns out that a lot of blocking is happening in resolver's ResolveModuleName:

Mitigation

I've removed all requests to perDirectoryResolutionCache and nonRelativeNameResolutionCache of the moduleNameCache. Thus, each import is resolved anew.

To avoid spamming OS with duplicated syscalls, I added cache (based on sync.Map) for func (*vfs.Common) Stat and func (*vfs.Common) getEntries.

Here are a few measurements:

Each binary exits immediately after compiler.NewProgram() call, so the binary's execution time is approximately equal to the compiler.NewProgram execution time.

tsgo-main - built from current main branch tsgo-without-fs-cache - without resolver cache, without VFS cache tsgo-fs-cache - without resolver cache, with VFS cache

I am running this experiment on microsoft/vscode source code.

Go GC is disabled via GOGC=off.

The full command looks like GOMAXPROCS=X GOGC=off hyperfine --warmup 2 --runs 10 './tsgo-main' './tsgo-without-fs-cache' './tsgo-fs-cache'

GNU/Linux (8 cores, 16 threads)

`GOMAXPROCS=16` (1.92x speedup)

Command	Mean [s]	Min [s]	Max [s]	Relative
`tsgo-fs-cache`	0.555 ± 0.007	0.543	0.572	1.00
`tsgo-main`	1.063 ± 0.007	1.055	1.075	1.92 ± 0.03
`tsgo-without-fs-cache`	0.630 ± 0.004	0.621	0.636	1.13 ± 0.02

`GOMAXPROCS=4` (1.45 speedup)

Command	Mean [s]	Min [s]	Max [s]	Relative
`tsgo-fs-cache`	0.942 ± 0.012	0.929	0.963	1.00
`tsgo-main`	1.365 ± 0.024	1.338	1.413	1.45 ± 0.03
`tsgo-without-fs-cache`	1.188 ± 0.008	1.175	1.203	1.26 ± 0.02

Windows (VM - 14 virtual processors)

`GOMAXPROCS=14` (1.99x speedup)

Command	Mean [s]	Min [s]	Max [s]	Relative
`tsgo-fs-cache.exe`	0.931 ± 0.018	0.894	0.955	1.00
`tsgo-main.exe`	1.854 ± 0.033	1.805	1.900	1.99 ± 0.05
`tsgo-without-fs-cache.exe`	2.337 ± 0.162	2.252	2.793	2.51 ± 0.18

`GOMAXPROCS=4` (1.49x speedup)

Command	Mean [s]	Min [s]	Max [s]	Relative
`tsgo-fs-cache.exe`	1.499 ± 0.012	1.486	1.526	1.00
`tsgo-main.exe`	2.237 ± 0.048	2.156	2.278	1.49 ± 0.03
`tsgo-without-fs-cache.exe`	2.502 ± 0.016	2.475	2.523	1.67 ± 0.02

The tsgo-fs-cache trace is as follows (the time range is the same as in the first image - 1 sec):

So without module resolution caching, parsing runs up to 2x faster on my machine.

Does it make sense to use moduleNameCache then? Maybe it worth removing it to reduce the number of blocking calls?

Mar 19 '25 13:03 auvred

@andrewbranch

Mar 19 '25 14:03 jakebailey

We should test on a range of different projects and on lots of different hardware. I’m not sure it will be better on a 2-core CI machine.

Mar 19 '25 16:03 andrewbranch

It works faster on small amount of procs too

GOMAXPROCS=1 (1.06x):

GOMAXPROCS=2 (1.27x):

GOMAXPROCS=3 (1.36x):

But yes, I agree that it should be more thoroughly tested. Also I don't know how it would work on macOS.

Mar 19 '25 16:03 auvred

FWIW setting GOMAXPROCS is not a totally sound way to see how a process behaves with fewer cores; there are other factors here that affect this than just how many threads the Go runtime spawns.

(I discovered this the hard way when attempting to performance test tsgo with fewer cores; setting GOMAXPROCS lower made performance scale far worse than going to an actual machine with fewer cores.)

That isn't to say that your scaling is wrong, of course, just something to be aware of.

Mar 19 '25 16:03 jakebailey

typescript-go typescript-go copied to clipboard

Module resolution caching mechanism slows down parsing phase a lot

Mitigation

Here are a few measurements:

GNU/Linux (8 cores, 16 threads)

GOMAXPROCS=16 (1.92x speedup)

GOMAXPROCS=4 (1.45 speedup)

Windows (VM - 14 virtual processors)

GOMAXPROCS=14 (1.99x speedup)

GOMAXPROCS=4 (1.49x speedup)

typescript-go
typescript-go copied to clipboard

`GOMAXPROCS=16` (1.92x speedup)

`GOMAXPROCS=4` (1.45 speedup)

`GOMAXPROCS=14` (1.99x speedup)

`GOMAXPROCS=4` (1.49x speedup)