fred.rs icon indicating copy to clipboard operation
fred.rs copied to clipboard

[Bug] fred is slow to compile

Open rukai opened this issue 1 year ago • 3 comments

I've observed that fred is the slowest dependency to build on our project, and we have a lot of dependencies.

The time to build fred on my machine is currently:

  • 26s for debug
  • 38s for release

So if you've got any ideas on how to improve fred's build time that would be great.

rukai avatar Jan 25 '24 06:01 rukai

Hey @rukai, I've noticed this too, but I don't think there's an easy answer.

From what I've seen async-trait and deeply nested async functions, specifically with loops mixed in, often just take a long time to compile. I'm messing around with AFIT at the moment and hopefully that helps, but we'll see.

The transactions logic was moved behind a FF for this reason - in my analysis two nested loops in that code path generated more lines of IR than the rest of the public interface combined. I figured disabling that would have a meaningful impact on compile times, but unfortunately it did not. If anybody knows of a better way to analyze compilation times please let me know.

The only thing I can think of here is to put each of the public interface traits behind its own FF. However, there are already 25 FFs, and there's 28 of these interface traits (let's call it 25 since a few are small), so we'd be looking at around 50 FFs then. That just seemed like a bridge too far for me. It would also likely be invasive for callers and I personally wouldn't consider this be to enough to warrant a breaking change or major release on its own.

Unfortunately at the moment I don't have a great answer here, but hopefully AFIT has some impact.

aembke avatar Jan 25 '24 15:01 aembke

I tried compiling with -Z self-profile from https://fasterthanli.me/articles/why-is-my-rust-build-so-slow, and there is a lot of time in evaluate_obligation:

$ summarize summarize fred-0111090.mm_profdata | head -n 20
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| Item                                                                    | Self time | % of total time | Time     | Item count |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| evaluate_obligation                                                     | 13.85s    | 41.102          | 13.88s   | 76520      |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_module_codegen_emit_obj                                            | 5.35s     | 15.878          | 5.35s    | 256        |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| self_profile_alloc_query_strings                                        | 4.78s     | 14.185          | 4.79s    | 1          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| typeck                                                                  | 1.27s     | 3.772           | 1.56s    | 3290       |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| LLVM_passes                                                             | 1.07s     | 3.165           | 1.07s    | 1          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| mir_borrowck                                                            | 907.62ms  | 2.693           | 17.22s   | 3290       |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| codegen_module                                                          | 789.36ms  | 2.342           | 1.08s    | 256        |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| type_op_prove_predicate                                                 | 781.08ms  | 2.317           | 14.30s   | 19412      |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+
| expand_proc_macro                                                       | 446.22ms  | 1.324           | 446.22ms | 33         |

so it does seem plausible that it's https://github.com/rust-lang/rust/issues/87012.

I wonder if it's worth trying to break fred up into multiple crates - both because that might mean the compilation can run in parallel and go faster, and because cargo build --timings might show which sub-crate is particularly slow and help narrow down the problem (perhaps to the point where a rustc issue can usefully progress it)? Even if the crate break-up isn't actually released, the latter point means that that change might make it easier to debug.

rkday avatar Jan 25 '24 21:01 rkday

That's an interesting idea. The vast majority of the code involves implementing the interface for the ~28 traits mentioned above, and I could see each of those being their own sub crate.

Building those components in parallel is one thing, but how would folks feel about going one step further and gating most of those by a FF? (Considering that would bring the total FF count to ~50). These changes don't necessarily need to happen at the same time, but I'm curious if anybody has an opinion on dealing with that many feature flags.

aembke avatar Jan 26 '24 02:01 aembke

As you noted, AFIT might help, that is a lot of async_trait usage in the codebase.

But we have no idea when full AFIT is actually coming so if you can measure a good win by gating by FF, then I can say that I would personally benefit from gating by FF. The crates.io limit is 300 which you would be still well within: https://blog.rust-lang.org/2023/10/26/broken-badges-and-23k-keywords.html

rukai avatar Jan 29 '24 00:01 rukai

Another valuable metric for me is the time it takes to compile a binary with LTO = fat for the purposes of iterative integration level benchmarking. Running cargo build --release --example basic in the fred repo root with LTO = fat takes 15s.

Deleting all fred code from the example results in a build time of 4s. Further deleting the tokio::main and using a regular empty main results in build time of 2s. So it seems like fred contributes 10 seconds to the LTO build while tokio contributes 2s.

This is a rarer use case, so dont stress it, but I thought it worth mentioning

rukai avatar Jan 29 '24 00:01 rukai

9.0.0 includes a bunch of stuff that might help here, most notably RPITIT and a bunch of new feature flags. Let me know if you have any thoughts: https://github.com/aembke/fred.rs/pull/234

aembke avatar Apr 06 '24 17:04 aembke

When compiling my project in debug build fred 8.0.6 was the most expensive dependency, taking 25s: image

Now, fred 9.0.1 is the 6th most expensive, taking 8s: image

A very nice improvement.

I've observed no difference in iterative build times, LTO or otherwise. But thats fine by, improving iterative build times seems a lot harder.

Thankyou for your efforts in this area, I'll go ahead and close this issue.

rukai avatar Apr 17 '24 22:04 rukai