Use Profile-Guided Optimization (PGO) for the compiler itself
Hi!
Since the compiler is to be rewritten in Go, I suggest considering using the Profile-Guided Optimization (PGO) option to optimize the Typescript compiler itself. Go supports PGO since 1.20 so it's an available option for the project. PGO for compiler-like workloads works especially well - e.g. check my PGO benchmarks for compilers (even if most of are for non-Go projects, the results should be pretty the same for Go since it uses the same ideas).
I suggest you the following plan:
- Perform PGO benchmarks for the compiler. It will require thinking about the training workload - I think compiling any Typescript project would be a good option for most of the use-cases
- Providing some scripts for simplifying building the compiler with PGO
- Integrating PGO optimization step into the CI pipeline so end-users will get a PGO-optimized version of the compiler.
I understand that the project is in its early-stages and probably there are more important things to finish at the moment. If this is true, just consider the issue as a point of improvement for the future versions. I believe that improving the compiler performance is a valuable-enough things for the end-users.
Thank you.
P.S. If you think that Discussions is a better place for such issues - feel free to move it there.
The most recent time I tried PGO in this repo, we actually got slower! Definitely need to retest and file a new issue if that's still true.
Wow, sounds pretty bad! I personally would be interested to see such a training/bench suite! If the training workload is pretty representative (for bench purposes at least for a start it's fine to have the same training and bench suites), and we can reproducibly catch the slowdown - it's worth reporting to the Go compiler upstream, IMHO.
$ rm -rf built
$ hereby build
$ mv built/local built/local-old
$ ./built/local-old/tsgo -p ~/work/vscode/src --pprofDir=.
$ mv *-cpuprofile.pb.gz ./cmd/tsgo/default.pgo
$ hereby build
$ hyperfine -w=1 './built/local-old/tsgo -p ~/work/vscode/src' './built/local/tsgo -p ~/work/vscode/src'
Benchmark 1: ./built/local-old/tsgo -p /home/jabaile/work/vscode/src
Time (mean ± σ): 8.657 s ± 0.791 s [User: 53.821 s, System: 7.241 s]
Range (min … max): 8.200 s … 10.878 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: ./built/local/tsgo -p /home/jabaile/work/vscode/src
Time (mean ± σ): 9.396 s ± 1.398 s [User: 50.923 s, System: 7.156 s]
Range (min … max): 8.137 s … 12.366 s 10 runs
Summary
./built/local-old/tsgo -p /home/jabaile/work/vscode/src ran
1.09 ± 0.19 times faster than ./built/local/tsgo -p /home/jabaile/work/vscode/src
Not exactly scientific; there's so much noise.
For reducing the noise, I suggest you additionally:
- Use CPU pinning (on Linux it's
taskset -ccommand. With this you reduce the CPU scheduler noise - Increase the number of warmups
- Increase the number of test runs
Currently, it's definitely hard to say, does PGO help or not for the project.
Yeah, I have to run this on my dedicated perf machine, it's just currently configured to segment off one physical core for tsc benchmarking, but now we have all of these cores so I have to figure something else out...