swc
swc copied to clipboard
Make swc compile faster.
Describe the feature
Some of the crates compile ridiculously slow. It would be really nice if we can speed it up a bit.
From cargo build --release --timings:

Additional context
Cross link: https://github.com/web-infra-dev/rspack/issues/2202
Update:
swc_ecma_visit and swc_css_visit is blocking compilation for a whole minute due to heavy usage of macros. See https://github.com/swc-project/swc/blob/main/crates/swc_ecma_visit/src/lib.rs
The culprit are these three blocking everything, and they take a minute each to compile:
I think the macros inside these crates should be done in a script and then copied out.
Codegen is performing really badly for the minifiers:
I wonder if this works: https://github.com/dtolnay/watt
Are you using codegen-units=1?
Ah yeah you are using it.
https://github.com/web-infra-dev/rspack/blob/cdf6a52a39f37a8ce2975692b7c46abc729169b6/Cargo.toml#LL12
I think that's the main problem
Changing to default codegen-units doesn't help (as seen from above), it's the macros ;-)
I know. I'm talking about codegen times of transform/minifier crates.
I wonder if the macro implementations are sub-optimal, for example https://users.rust-lang.org/t/5-hours-to-compile-macro-what-can-i-do/36508
Yeah, they are not optimal. It's a known issue
Assign this to me if you don't have the time, I'll dig deeper.
This is not a focus, at least at the moment. I have time, but I don't want to use my personal time for this.
You can create fake types like Expr(Tokens) or Stmt(Tokens) to store tokens as-is without parsing from proc macros. But it should be conditional using cfg , so it can be verified by giving a feature flag.
Currently pmutil stores tokens in a parsed form, but Tokens => Expr => Tokens is a waste.
Summoned via https://twitter.com/boshen_c/status/1635842195113787392, I took a look at this.
First, I'll consider the high codegen times for some of the crates.
I have an Intel i9-7940X which has 14 physical cores and 28 virtual cores. My compile times for swc were:
cargo check: 1m00scargo build: 1m19scargo build --release: 2m19s
So, a pretty small jump going from check to build, but a big jump going from build to build --release.
--timings shows that release builds are spending a huge amount of time in codegen, as mentioned above. Here is some of the --timings graph for a debug build:

And here is the same part for an opt build:

Note that the purple (codegen) part is massively bigger in the release build. I've never seen so much purple in a --timings graph. And it only happens in the swc* crates at the bottom of the crate graph. Ones higher up have a much higher blue-to-purple ratio, like I'd expect.
swc_node_bundler is a good example, taking 1.25s in a debug build and 43.13s for a release build, which is a gigantic difference. I tried downloading just that crate from crates.io and compiling it using the rustc-perf benchmark harness. I got reasonably similar results: 0.8s for debug and 33.0s for release.
I then tried profiling the compiler with samply while doing debug and release builds of swc_node_bundler. Here is the thread timeline for a debug build:

and for an opt build:

rustc is the front-end thread, the other threads are doing codegen. There are multiple codegen threads running in parallel, which suggests that the codegen-units=1 theory from above is incorrect. (Besides, how does rspack even relate to swc?)
For the debug build we have four "opt" threads, which are WorkItem::Optimize units within the compiler. For the release build we have three "opt" threads and eleven "LTO" threads, which are WorkItem::LTO units within the compiler. The "LTO" threads don't start until the "opt" threads finish. The top-most "LTO" thread accounts for 26 seconds of the runtime, running by itself for much of that time, so that seems to be much of the problem.
I don't know much about these "LTO" threads, and why one of them would be so slow. It definitely seems odd. The crate has only 767 lines of Rust code in it, and it looks like very normal, reasonable code. My current theory is that one of the crates that swc_node_bundler depends on is doing something unusual that is causing lots of swc* crates to be so slow to codegen. swc_ecma_ast and swc_ecma_visit look like the ones all the slow-to-compile crates have in common. Interestingly, those are two of the three crates with problematic macros that @Boshen mentioned above.
Ok, there ends part 1 of my analysis.
@nnethercote Thank you so much for looking into this.
how does
rspackeven relate toswc
The rspack project depends on almost all of swc.
I've never seen so much purple in a --timings graph.
I thought it's normal for codegen to take this much time, apparently it's not π So now we have two problems at hand: macros and codegen.
Oh... Interesting. I took the ratio graph granted because swc is my first big rust project, but it was not common π€£
I know the solution for the proc macro part, and AFAIK the long codegen is caused by visitors. I profiled it a long time ago, although I didn't use rustc perf tester.
Now for the crates using macros.
swc_ecma_visit-0.86.1
- Has 10,018 lines of code, but 7,355 of that is generated in
target/debug/build/swc_atoms-fa283f5fd94de3ff/out/js_word.rs, mostly for a perfect hash function. cargo expand's output is 84,893 lines of code. That's a lot! Much of that is lots of very large types with many fold and visit operations defined on them.
swc_ecma_ast-0.100.1
- 14,621 lines of code, but again, 7,355 of that is in
target/debug/build/swc_atoms-fa283f5fd94de3ff/out/js_word.rs cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.
I looked at samply profiles of check builds for both of these. The profiles looked pretty normal, which suggests that the code isn't particularly unusual, but just that there's a lot of it.
I think it will take project-specific understanding to improve things. Looking at the output of cargo expand could be helpful. There is so much code there. Is all of it necessary? Could it be made shorter? And maybe reducing the amount of code in those modules might help with the codegen times in later modules.
Interesting. I took the ratio graph granted because swc is my first big rust project, but it was not common rofl
That's right. If you look at the earlier swc crates, and the non-swc crates, you can see that the blue and purple lengths are usually fairly similar. (Likewise with all the crates in debug builds.) Sometimes the purple part might be 4 or 5 times longer, which isn't unusual. But ratios like 10, 20, 30 are unusual.
About macros:
- swc_ecma_ast:
Macros create an enormous amount of code, and I think they can be reduced a bit, but not by a margin. My main trick for reducing the amount of code will be extracting common code to swc_common or swc_visit.
It includes
-
custom derive for serde
-
many implementation of
From<T> -
derive of rkyv::Archive
-
derive of many built-in traits
-
swc_ecma_visit
The proc-macro generates two kinds of visitors. The first one is general visitors used by swc itself, and the second one is a-path-aware visitors used by rspack and turbopack.
Btw, can #[inline]/generic generated by proc-macro can cause such issues?
I'll work on this
cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.
Parcel (and probably also rspack) currently doesn't use serde/rykv for swc ASTs, so maybe compile time could be improved here by putting that behind a cargo feature.
I'm working on with #7138
2023λ 3μ 24μΌ (κΈ) μ€ν 7:24, Niklas Mischkulnig @.***>λμ΄ μμ±:
cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.
Parcel (and probably also rspack) never uses serde/rykv for swc ASTs, so maybe compile time could be improved here by putting that behind a cargo feature.
β Reply to this email directly, view it on GitHub https://github.com/swc-project/swc/issues/7071#issuecomment-1482573511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHELSJ7IQM3ERMHAZIHPQ6LW5VY6DANCNFSM6AAAAAAVYSUUEY . You are receiving this because you were assigned.Message ID: @.***>
Can you try [email protected]?
I made serde of AST optional, and off by default so the compile time should be improved
Rspack is currently stuck on an older version due to #7085, I'll report back the improvements once we upgrade to the latest version when that's fixed on our end.
Not sure if I did something wrong, but the compiletime didn't get better for Parcel on my machine:
- https://github.com/parcel-bundler/parcel/commit/f2042885cd30a77ea81445847b62a3d72dd0e5ef (before your change). Build takes 4m 15s.
- https://github.com/parcel-bundler/parcel/commit/b180a3075a812f288999fccd473247523f31dca3 (after your change, but using still swc_ecmascript). Build takes 4m 15s.
- https://github.com/parcel-bundler/parcel/commit/f16be9786cae3777aabed7c5855c0c0f215608c6 (after your change, using swc_core). Build takes 4m 30s.
I ran rm -rf target && RUSTC_WRAPPER= yarn workspace @parcel/transformer-js build-release in the root of the repo.
Oh... interesting.
I ran cargo build --timings and cargo build --timings --release from the repository root of vercel/turbo.
(Also, I ran cargo clean each time, and I did nothing while compiling to reduce noise)
Debug build:
New: Finished dev [unoptimized + debuginfo] target(s) in 2m 15s
Prev: Finished dev [unoptimized + debuginfo] target(s) in 2m 56s
This was the result for turbopack, but this is before AST change
I found cargo-llvm-lines from matklad's blog post.
This can be used to guide your refactoring, e.g.
cargo llvm-lines -p swc_ecma_parser | head -20
Lines Copies Function name
----- ------ -------------
368372 4504 (TOTAL)
11760 (3.2%, 3.2%) 35 (0.8%, 0.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized
11707 (3.2%, 6.4%) 13 (0.3%, 1.1%) swc_ecma_parser::parser::class_and_fn::<impl swc_ecma_parser::parser::Parser<I>>::parse_fn_args_body::{{closure}}
9803 (2.7%, 9.0%) 1 (0.0%, 1.1%) swc_ecma_parser::parser::stmt::module_item::<impl swc_ecma_parser::parser::Parser<I>>::parse_export
7383 (2.0%, 11.0%) 15 (0.3%, 1.4%) swc_ecma_parser::parser::typescript::<impl swc_ecma_parser::parser::Parser<I>>::try_parse_ts
5477 (1.5%, 12.5%) 1 (0.0%, 1.4%) swc_ecma_parser::parser::stmt::module_item::<impl swc_ecma_parser::parser::Parser<I>>::parse_import
In some other place:
89536 (11.0%, 11.0%) 1399 (5.3%, 5.3%) swc_visit::AstNodePath<N>::with
37854 (4.7%, 15.7%) 701 (2.6%, 7.9%) swc_visit::AstKindPath<K>::with
In cases where generics can not be removed, https://matklad.github.io/2021/09/04/fast-rust-builds.html#Keeping-Instantiations-In-Check explains the "inner" technique.
When i tried compiling flash (which depends on swc), the swc_ecma_transforms_compat crate brought by swc caused the build to crash because linux kernel decided to kill build process (likely because it was eating way too much resources), and honestly i agree with it: the above-mentioned crate's build process was eating almost 100% of cpu and more than a half of RAM, which is (obviously) not quite adequate.
Hopefully more attention will be brought to this issue as this can potentially break builds, not only slow them down (this should be most noticeable on CI/CD services where containers are commonly limited to a quite low amount of available resources, and sometimes even run time).
EDIT: Not sure what caused you to block me, it's not like i was going to spam or call names developers of this project. If my speech was disruptive, then sorry. I rephrased most parts of the message.
I'll tackle this again in the near future. I want to make ES AST/parser extensible and have related idea, but it will make compilation slower
I think one of the biggest problems is the lack of parallelism, and wrote https://github.com/swc-project/swc/discussions/7911
It's RFC, and I want to hear opinions about such CLI tool.
https://github.com/swc-project/swc/pull/8110 should improve compile time a bit.
Also, this is an experiment for turbopack, but depending directly on crates makes compilation faster. https://github.com/vercel/turbo/pull/5879
I'm going to create a CLI tool to manage just as you were using swc_core, while depending directly
@Boshen @mischnic Can you profile it again, but with swc_core and only with the features you use?