rustfmt
rustfmt copied to clipboard
Parallel formatting to increase speed
Multithreaded parallelism
Closes https://github.com/rust-lang/rustfmt/issues/6091
This is non-trivial because when internals can run threaded, eagerly printing cannot happen, messages have to be passed back through buffers. Even that would be non-trivial if there wasn't color-term writing happening making things more complicated.
The change became pretty big, even though the only substantive change
is in print.rs and bin/main.rs, because the printer has to be propagated
deep into the library, about half of the changes are adding a function argument, and reformatting
caused by those lines becoming too long. I'm sorry about that.
Changes
Output facilities
In practice there are four printing facilities used:
- Regular
stdout, pretty easy, replaceprintlnwith printing into a buffer. - Regular
stderr, same as above in all ways that matter. - Term stdout, this happens in the
diff-printing part of the code. - Term stderr, this is done by
rustc_errorand the most complex to integrate.
Additionally, these four facilities can't be separated, since they have to preserve order between each other.
Rename StdoutEmitter
Confusing naming, it doesn't output into stdout, but into the
provided buffer.
Pros
This change brings a substantial speedup, especially for large projects, the speedup becoming less and less but not none or negative for smaller projects.
If instant measures as the average reaction time of 250ms, this brings down tokio from the not-instant ~260ms, to the instant 86ms.
Giving space for projects almost 3x tokio's size to have an "instant" formatting experience.
Results locally on tokio with diferent parallelism:
1: Around 260ms, within noise of the master branch.
2: Around 160ms, 39% less time spent than the master branch.
4: Around 110ms, 58% less time spent than the master branch.
32: Around 85ms, 67% less time spent than the master branch.
The dramatic effect with just 2 cores means that this change doesn't just have an effect if you've got a threadripper, I don't have the data, but I would imagine that most users have 2 or more cores and would benefit.
Drawbacks
There are some drawbacks to consider
Late printing memory increase
All messages that would have gone into stdout or stderr and been flushed eagerly now go
into an in-mem buffer. This causes a memory-overhead which scales with project size
(number of files). For a large project (went from 1700ms to about 500ms with this change in cargo fmt --all -time), this is a difference of 117M on the master version, and 170M on this version (~+45%).
Rustc termcolor vendoring
Another problem is the way that rustc vendors termcolor. It vendors some types but no printing facilities,
and the vendored code isn't interoperable with the published termcolor library, which means that
either the printing facilities need to be reinvented (no), or some compatibility-code needs to be introduced (yes).
Hopefully there's some better solution to this, but it's liveable at the moment see the rustc_compat* functions in print.rs.
Accidentally messing with outputs will cause an output jumble
This change would make any direct interaction with stdout and stderr in the
formatting codepath a bug. Printing deep inside a library is usually inadvisable, but now that would be upgraded to a bug.
Some testing
Apart from the usual tests, I wanted to make sure that the rustc_errors printing is the same, pretty easy, just remove a semicolon:
And diffing:
Although there are tests for those already.
Lastly I did a test modifying a bunch of different files in a project and then ran cargo fmt --all --check,
the order within-file is preserved, but the order in which diffs are printed is "random".
It's intended, and the reason for that is because output is printed by file in completion-order. It could await the join handles in order, and get the same order as the singe-threaded implementation, but then if there are more files than available parallelism, full thread utilization won't happen, which in practice measurably impacts performance on large projects.
It's possible to do a half-measure and wait in handle-order for the remaining files after all tasks have been scheduled, this would preserve ordering in projects with fewer files than available parallelism, but that doesn't seem very impactful to me, since available parallelism likely differs a lot.
Thanks for the PR. Going to put this on hold until the team has had a chance to discuss this change https://github.com/rust-lang/rustfmt/issues/6091#issuecomment-1964539411.