ruff Splitting up the `ruff` crate for faster compile times

trafficstars

@squiddy commented on #1547 with:

Slightly related: Any thoughts on splitting up the code into multiple crates? From my limited understanding this is one way to go for faster compiles. Multiple crates can be compiled in parallel. I did some profiling the other day and a release build takes quite some time because everything is one crate currently.

Crates can only be compiled in parallel if they don't depend on each other. Matklad has a nice blog post "Fast Rust Builds".

I just opened the PR #1816 to split off the CLI into a separate ruff_cli crate, since ruff_cli depends on ruff this doesn't really change the total build time (especially because ruff_cli can be compiled very quickly ... except the bin generation).

These graph were generated via cargo build --release --timings, see the timing before the split and the timing after the split.

I think the question is if/how we could further split up the ruff library. Currently the rule implementations depend on ast::Checker and ast::Checker depends on the rule implementations. I am not sure what we can do about that.

With my PR #1816 cargo build -p ruff for the first time now builds ~100 dependencies fewer than cargo build for the first time (since it doesn't build the CLI), so this might still be neat for people who just want to contribute a lint and test that it works.

Jan 12 '23 16:01 not-my-profile

I think another benefit of this could be if we want to have certain features not always included for the pip install. For example, #1628 wants to add spell checking which adds the typos dependency. Since we are adding a dependency and code that will only be used in one place, I feel like this is a great feature to keep in a separate crate.

Jan 12 '23 16:01 colin99d

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

Jan 12 '23 16:01 not-my-profile

Dev compile times are really getting to me 😅 Have they regressed at all lately? Perhaps we haven't been tracking closely enough to answer that.

Jan 20 '23 23:01 charliermarsh

One benefit of separate crates is that it avoids re-compiling unchanged code because Rust skips crates where neither its dependencies nor modules have changed.

Jan 21 '23 13:01 MichaReiser

As an initial proposal to tear down, would something like this work?

One crate per linter (like eradicate).
A crate for core, linter-agnostic definitions (like Violation, Diagnostic, etc.). This could include a trait to outline the Checker API, which the linter crates could depend on (but not its implementation).
A crate that depends on all of those, and includes Checker -- so it ties together the linters, and the linter-agnostic definitions.
A couple one-off crates that can exist as standalone libraries, like the current source crate that contains Generator, Locator, etc.
(Not sure what to do with Settings and the per-linter settings structs.)

Feb 03 '23 21:02 charliermarsh

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

This was my first thought. Has anyone started profiling some of these ideas? I don't think splitting ruff into more crates and introducing features are mutually exclusive, and it's probably a much bigger undertaking to do the former.

Feb 04 '23 01:02 cnpryer

Dev compile times are really getting to me

cargo test -p ruff is much faster than cargo test since it doesn't compile the ruff_cli. We could consider dropping ruff_cli from default-members in Cargo.toml to speed up cargo test when adding rule implementations ... but then cargo run wouldn't work anymore and you'd have to use cargo run -p ruff_cli (and the benefit would only apply when you just run cargo test but not cargo run but I still think it could be worth it).

One crate per linter

I don't think we want to structure our code by linter at all. I think we rather want to group our AST rules by which AST node they target.

In general I think we should firstly improve the structure of our code within the ruff crate before thinking about how we can split it up.

Has anyone started profiling some of these ideas?

Reducing such dependencies won't do much for our dev compile time since these dependencies only need to be compiled once.

Feb 04 '23 06:02 not-my-profile

I think we rather want to group our AST rules by which AST node they target.

I can think of reasons we wouldn't want to do this, but it feels unproductive to debate it right now. Most important is getting the codebase into a state at which we can start breaking it down :) which requires at least (1) changing the project structure (e.g., #2088 or similar) and (2) disentangling the mutual dependencies between rule implementations and Checker et al, plus other work I'm sure.

Feb 04 '23 13:02 charliermarsh

Will how we build this also determine how we will let people "hook" into ruff and build their own modules into our linter, like flake8 has? Also, once you guys decide on a plan feel free to assign me some refactoring work!

Feb 04 '23 13:02 colin99d

One thing that seems somewhat straightforward to achieve is to split out the ast, source_code, cst and docstring modules into a ruff_python_syntax crate (assuming I understood their purpose correctly and it make sense to group them in such crates).

Open questions

source_code depends on vendor: Move vendor to ruff_python_syntax or introduce new ruff_vendor crate?
stylist depends on leading_quote helper -> Move/extract
A few ast helpers depend on Checker. Move to ruff and/or implement as extenion traits on Checker.

I've probably overlooked a few dependencies but it seems less hard than others (e.g. extracting settings is.... challenging)

Feb 22 '23 15:02 MichaReiser

I am starting to hack on a few of these problems in https://github.com/charliermarsh/ruff/pull/3298, mostly to see what breaks and what problems we run into. (Not ready for review but feedback welcome, etc.)

Mar 02 '23 04:03 charliermarsh

ruff ruff copied to clipboard

Splitting up the `ruff` crate for faster compile times

Open questions

ruff
ruff copied to clipboard