swc icon indicating copy to clipboard operation
swc copied to clipboard

Make swc compile faster.

Open Boshen opened this issue 2 years ago β€’ 33 comments

Describe the feature

Some of the crates compile ridiculously slow. It would be really nice if we can speed it up a bit.

From cargo build --release --timings:

ade068e1-fa5e-49c3-b1ec-11e14f4767df

Additional context

Cross link: https://github.com/web-infra-dev/rspack/issues/2202


Update:

swc_ecma_visit and swc_css_visit is blocking compilation for a whole minute due to heavy usage of macros. See https://github.com/swc-project/swc/blob/main/crates/swc_ecma_visit/src/lib.rs

Boshen avatar Mar 13 '23 04:03 Boshen

The culprit are these three blocking everything, and they take a minute each to compile:

image

I think the macros inside these crates should be done in a script and then copied out.

Boshen avatar Mar 14 '23 07:03 Boshen

Codegen is performing really badly for the minifiers:

image

Boshen avatar Mar 14 '23 07:03 Boshen

I wonder if this works: https://github.com/dtolnay/watt

Boshen avatar Mar 14 '23 07:03 Boshen

Are you using codegen-units=1?

kdy1 avatar Mar 14 '23 07:03 kdy1

Ah yeah you are using it.

https://github.com/web-infra-dev/rspack/blob/cdf6a52a39f37a8ce2975692b7c46abc729169b6/Cargo.toml#LL12

I think that's the main problem

kdy1 avatar Mar 14 '23 07:03 kdy1

image

Changing to default codegen-units doesn't help (as seen from above), it's the macros ;-)

Boshen avatar Mar 14 '23 08:03 Boshen

I know. I'm talking about codegen times of transform/minifier crates.

kdy1 avatar Mar 14 '23 08:03 kdy1

I wonder if the macro implementations are sub-optimal, for example https://users.rust-lang.org/t/5-hours-to-compile-macro-what-can-i-do/36508

Boshen avatar Mar 14 '23 08:03 Boshen

Yeah, they are not optimal. It's a known issue

kdy1 avatar Mar 14 '23 08:03 kdy1

Assign this to me if you don't have the time, I'll dig deeper.

Boshen avatar Mar 14 '23 09:03 Boshen

This is not a focus, at least at the moment. I have time, but I don't want to use my personal time for this.

You can create fake types like Expr(Tokens) or Stmt(Tokens) to store tokens as-is without parsing from proc macros. But it should be conditional using cfg , so it can be verified by giving a feature flag.

Currently pmutil stores tokens in a parsed form, but Tokens => Expr => Tokens is a waste.

kdy1 avatar Mar 14 '23 12:03 kdy1

Summoned via https://twitter.com/boshen_c/status/1635842195113787392, I took a look at this.

First, I'll consider the high codegen times for some of the crates.

I have an Intel i9-7940X which has 14 physical cores and 28 virtual cores. My compile times for swc were:

  • cargo check: 1m00s
  • cargo build: 1m19s
  • cargo build --release: 2m19s

So, a pretty small jump going from check to build, but a big jump going from build to build --release.

--timings shows that release builds are spending a huge amount of time in codegen, as mentioned above. Here is some of the --timings graph for a debug build:

debug

And here is the same part for an opt build:

release

Note that the purple (codegen) part is massively bigger in the release build. I've never seen so much purple in a --timings graph. And it only happens in the swc* crates at the bottom of the crate graph. Ones higher up have a much higher blue-to-purple ratio, like I'd expect.

swc_node_bundler is a good example, taking 1.25s in a debug build and 43.13s for a release build, which is a gigantic difference. I tried downloading just that crate from crates.io and compiling it using the rustc-perf benchmark harness. I got reasonably similar results: 0.8s for debug and 33.0s for release.

I then tried profiling the compiler with samply while doing debug and release builds of swc_node_bundler. Here is the thread timeline for a debug build:

debug2

and for an opt build:

release2

rustc is the front-end thread, the other threads are doing codegen. There are multiple codegen threads running in parallel, which suggests that the codegen-units=1 theory from above is incorrect. (Besides, how does rspack even relate to swc?)

For the debug build we have four "opt" threads, which are WorkItem::Optimize units within the compiler. For the release build we have three "opt" threads and eleven "LTO" threads, which are WorkItem::LTO units within the compiler. The "LTO" threads don't start until the "opt" threads finish. The top-most "LTO" thread accounts for 26 seconds of the runtime, running by itself for much of that time, so that seems to be much of the problem.

I don't know much about these "LTO" threads, and why one of them would be so slow. It definitely seems odd. The crate has only 767 lines of Rust code in it, and it looks like very normal, reasonable code. My current theory is that one of the crates that swc_node_bundler depends on is doing something unusual that is causing lots of swc* crates to be so slow to codegen. swc_ecma_ast and swc_ecma_visit look like the ones all the slow-to-compile crates have in common. Interestingly, those are two of the three crates with problematic macros that @Boshen mentioned above.

Ok, there ends part 1 of my analysis.

nnethercote avatar Mar 20 '23 04:03 nnethercote

@nnethercote Thank you so much for looking into this.

how does rspack even relate to swc

The rspack project depends on almost all of swc.

I've never seen so much purple in a --timings graph.

I thought it's normal for codegen to take this much time, apparently it's not 😞 So now we have two problems at hand: macros and codegen.

Boshen avatar Mar 20 '23 05:03 Boshen

Oh... Interesting. I took the ratio graph granted because swc is my first big rust project, but it was not common 🀣

I know the solution for the proc macro part, and AFAIK the long codegen is caused by visitors. I profiled it a long time ago, although I didn't use rustc perf tester.

kdy1 avatar Mar 20 '23 05:03 kdy1

Now for the crates using macros.

swc_ecma_visit-0.86.1

  • Has 10,018 lines of code, but 7,355 of that is generated in target/debug/build/swc_atoms-fa283f5fd94de3ff/out/js_word.rs, mostly for a perfect hash function.
  • cargo expand's output is 84,893 lines of code. That's a lot! Much of that is lots of very large types with many fold and visit operations defined on them.

swc_ecma_ast-0.100.1

  • 14,621 lines of code, but again, 7,355 of that is in target/debug/build/swc_atoms-fa283f5fd94de3ff/out/js_word.rs
  • cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.

I looked at samply profiles of check builds for both of these. The profiles looked pretty normal, which suggests that the code isn't particularly unusual, but just that there's a lot of it.

I think it will take project-specific understanding to improve things. Looking at the output of cargo expand could be helpful. There is so much code there. Is all of it necessary? Could it be made shorter? And maybe reducing the amount of code in those modules might help with the codegen times in later modules.

nnethercote avatar Mar 20 '23 05:03 nnethercote

Interesting. I took the ratio graph granted because swc is my first big rust project, but it was not common rofl

That's right. If you look at the earlier swc crates, and the non-swc crates, you can see that the blue and purple lengths are usually fairly similar. (Likewise with all the crates in debug builds.) Sometimes the purple part might be 4 or 5 times longer, which isn't unusual. But ratios like 10, 20, 30 are unusual.

nnethercote avatar Mar 20 '23 05:03 nnethercote

About macros:

  • swc_ecma_ast:

Macros create an enormous amount of code, and I think they can be reduced a bit, but not by a margin. My main trick for reducing the amount of code will be extracting common code to swc_common or swc_visit.

It includes

  • custom derive for serde

  • many implementation of From<T>

  • derive of rkyv::Archive

  • derive of many built-in traits

  • swc_ecma_visit

The proc-macro generates two kinds of visitors. The first one is general visitors used by swc itself, and the second one is a-path-aware visitors used by rspack and turbopack.

Btw, can #[inline]/generic generated by proc-macro can cause such issues?

kdy1 avatar Mar 20 '23 06:03 kdy1

I'll work on this

kdy1 avatar Mar 23 '23 02:03 kdy1

cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.

Parcel (and probably also rspack) currently doesn't use serde/rykv for swc ASTs, so maybe compile time could be improved here by putting that behind a cargo feature.

mischnic avatar Mar 24 '23 10:03 mischnic

I'm working on with #7138

2023λ…„ 3μ›” 24일 (금) μ˜€ν›„ 7:24, Niklas Mischkulnig @.***>λ‹˜μ΄ μž‘μ„±:

cargo expand's output is 120,008 lines of code. That's even more! Much of that is serializing/deserializing code generated by serde, which is known to produce verbose code.

Parcel (and probably also rspack) never uses serde/rykv for swc ASTs, so maybe compile time could be improved here by putting that behind a cargo feature.

β€” Reply to this email directly, view it on GitHub https://github.com/swc-project/swc/issues/7071#issuecomment-1482573511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHELSJ7IQM3ERMHAZIHPQ6LW5VY6DANCNFSM6AAAAAAVYSUUEY . You are receiving this because you were assigned.Message ID: @.***>

kdy1 avatar Mar 24 '23 12:03 kdy1

Can you try [email protected]? I made serde of AST optional, and off by default so the compile time should be improved

kdy1 avatar Mar 27 '23 05:03 kdy1

Rspack is currently stuck on an older version due to #7085, I'll report back the improvements once we upgrade to the latest version when that's fixed on our end.

Boshen avatar Mar 27 '23 05:03 Boshen

Not sure if I did something wrong, but the compiletime didn't get better for Parcel on my machine:

  • https://github.com/parcel-bundler/parcel/commit/f2042885cd30a77ea81445847b62a3d72dd0e5ef (before your change). Build takes 4m 15s.
  • https://github.com/parcel-bundler/parcel/commit/b180a3075a812f288999fccd473247523f31dca3 (after your change, but using still swc_ecmascript). Build takes 4m 15s.
  • https://github.com/parcel-bundler/parcel/commit/f16be9786cae3777aabed7c5855c0c0f215608c6 (after your change, using swc_core). Build takes 4m 30s.

I ran rm -rf target && RUSTC_WRAPPER= yarn workspace @parcel/transformer-js build-release in the root of the repo.

mischnic avatar Mar 27 '23 13:03 mischnic

Oh... interesting.

I ran cargo build --timings and cargo build --timings --release from the repository root of vercel/turbo.
(Also, I ran cargo clean each time, and I did nothing while compiling to reduce noise)

Debug build:

New: Finished dev [unoptimized + debuginfo] target(s) in 2m 15s
Prev: Finished dev [unoptimized + debuginfo] target(s) in 2m 56s

This was the result for turbopack, but this is before AST change

kdy1 avatar Mar 27 '23 13:03 kdy1

I found cargo-llvm-lines from matklad's blog post.

This can be used to guide your refactoring, e.g.

cargo llvm-lines -p swc_ecma_parser | head -20

  Lines                 Copies              Function name
  -----                 ------              -------------
  368372                4504                (TOTAL)
   11760 (3.2%,  3.2%)    35 (0.8%,  0.8%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   11707 (3.2%,  6.4%)    13 (0.3%,  1.1%)  swc_ecma_parser::parser::class_and_fn::<impl swc_ecma_parser::parser::Parser<I>>::parse_fn_args_body::{{closure}}
    9803 (2.7%,  9.0%)     1 (0.0%,  1.1%)  swc_ecma_parser::parser::stmt::module_item::<impl swc_ecma_parser::parser::Parser<I>>::parse_export
    7383 (2.0%, 11.0%)    15 (0.3%,  1.4%)  swc_ecma_parser::parser::typescript::<impl swc_ecma_parser::parser::Parser<I>>::try_parse_ts
    5477 (1.5%, 12.5%)     1 (0.0%,  1.4%)  swc_ecma_parser::parser::stmt::module_item::<impl swc_ecma_parser::parser::Parser<I>>::parse_import

In some other place:

  89536 (11.0%, 11.0%)  1399 (5.3%,  5.3%)  swc_visit::AstNodePath<N>::with
  37854 (4.7%, 15.7%)    701 (2.6%,  7.9%)  swc_visit::AstKindPath<K>::with

In cases where generics can not be removed, https://matklad.github.io/2021/09/04/fast-rust-builds.html#Keeping-Instantiations-In-Check explains the "inner" technique.

Boshen avatar Mar 29 '23 08:03 Boshen

When i tried compiling flash (which depends on swc), the swc_ecma_transforms_compat crate brought by swc caused the build to crash because linux kernel decided to kill build process (likely because it was eating way too much resources), and honestly i agree with it: the above-mentioned crate's build process was eating almost 100% of cpu and more than a half of RAM, which is (obviously) not quite adequate. Hopefully more attention will be brought to this issue as this can potentially break builds, not only slow them down (this should be most noticeable on CI/CD services where containers are commonly limited to a quite low amount of available resources, and sometimes even run time).

EDIT: Not sure what caused you to block me, it's not like i was going to spam or call names developers of this project. If my speech was disruptive, then sorry. I rephrased most parts of the message.

elenakrittik avatar Jun 02 '23 08:06 elenakrittik

I'll tackle this again in the near future. I want to make ES AST/parser extensible and have related idea, but it will make compilation slower

kdy1 avatar Aug 17 '23 04:08 kdy1

I think one of the biggest problems is the lack of parallelism, and wrote https://github.com/swc-project/swc/discussions/7911

It's RFC, and I want to hear opinions about such CLI tool.

kdy1 avatar Sep 03 '23 09:09 kdy1

https://github.com/swc-project/swc/pull/8110 should improve compile time a bit.

Also, this is an experiment for turbopack, but depending directly on crates makes compilation faster. https://github.com/vercel/turbo/pull/5879

I'm going to create a CLI tool to manage just as you were using swc_core, while depending directly

kdy1 avatar Oct 12 '23 09:10 kdy1

@Boshen @mischnic Can you profile it again, but with swc_core and only with the features you use?

kdy1 avatar Oct 16 '23 08:10 kdy1