cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Clean package perf improvements

Open osiewicz opened this issue 2 months ago • 3 comments

What does this PR try to resolve?

I've noticed that cargo clean -p execution time scales poorly with size of target directory; in my case (~250GB target directory on M1 Mac) running cargo clean -p takes circa 35 seconds. Notably, it's the file listing that takes that time, not deleting the package itself. That is, when running cargo clean -p SOME_PACKAGE twice in a row, both executions take roughly the same time.

I've tracked it down to the fact that we seem quite happy to use glob::glob function, which iterates over contents of target dir. It also was a bit sub-optimal when it came to doing that, for which I've already filled a PR in https://github.com/rust-lang/glob/pull/144 - that PR alone takes down cleaning time down to ~14 seconds. While it is a good improvement for a relatively straightforward change, this PR tries to take it even further. With glob PR applied + changes from this PR, my test case goes down to ~6 seconds. I'm pretty sure that we could squeeze this further, but I'd rather do so in a follow-up PR.

Notably, this PR doesn't help with just super-large target directories. cargo clean -p serde on cargo repo (with ~7Gb target directory size) went down from ~380ms to ~100ms for me. Not too shabby.

How should we test and review this PR?

I've mostly tested it manually, running cargo clean against multiple different repos.

Additional information

TODO:

  • [x] c770700 is not quite correct; we need to consider that it changes how progress reporting works; as is, we're gonna report all progress relatively quickly and stall at the end (when we're actually iterating over directories, globbing, removing files, that kind of jazz). I'll address that.

osiewicz avatar Apr 28 '24 11:04 osiewicz

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

rustbot avatar Apr 28 '24 11:04 rustbot

#5931 will require re-working the target directory to be organized around package names. Depending on the complexity this entails, I wonder if its better to wait on that. Granted, my bias is that I never use cargo clean -p and instead do cargo clean.

epage avatar Apr 29 '24 15:04 epage

Ah, I see; thanks for pointing that issue out. That's definitely another good argument for not squeezing every last bit out of cargo clean -p if it's gonna be reworked in near-ish term. The only pending item I want to address is that TODO for progress reporting; I expect this PR to be ready for review tomorrow-ish.

osiewicz avatar Apr 29 '24 15:04 osiewicz

@bors r+

weihanglo avatar May 02 '24 13:05 weihanglo

:pushpin: Commit 9c1139ec72b7fbc9bfc8678a7c50276a533c7765 has been approved by weihanglo

It is now in the queue for this repository.

bors avatar May 02 '24 13:05 bors

:hourglass: Testing commit 9c1139ec72b7fbc9bfc8678a7c50276a533c7765 with merge 27d3e3d603e75d96c9a4bd94d651fc8445c63536...

bors avatar May 02 '24 13:05 bors

:sunny: Test successful - checks-actions Approved by: weihanglo Pushing 27d3e3d603e75d96c9a4bd94d651fc8445c63536 to master...

bors avatar May 02 '24 14:05 bors