cargo
cargo copied to clipboard
Have an option to make Cargo attempt to clean up after itself.
Cargo leaves build artifacts around forever, without making any attempt to ever clean them up. This is a feature but also a somewhat inconvenient one since your ./target folder grows forever, and this can make life harder for large projects and CI systems and such. You either delete everything and rebuild everything all the time, wasting a lot of time, or you deal with potentially very large folders. This also makes life harder for the idea of cargo eventually being able to share build artifacts between projects, as described in this reddit thread.
I propose to add a simple, conservative command line flag/config option for Cargo to attempt to clean up files that have not been used in any builds for a while. Something like --clean-unused 30d to remove any files that haven't been used in any builds that have run in the last 30 days. Hopefully this will provide a simple and useful tool for solving this sort of problem in many cases, while not resulting in unnecessary rebuilding of things. It also should pave the way for experimenting with more sophisticated decision-making about which build artifacts to keep and which to delete in the future.
Related: #5026
As I said on that thread I would love to see this experimented with out of tree. There are a lot of good heuristics that may work well. When we see which are widely useful we can talk about bringing them in.
If out of tree development is not possible do to limitations of cargo, (like dose cargo leave a timestamp of the most recent time it used an artefact,) then let's see about getting it to be possible.
I'd love a tool based on last used date. I'd love a tool that compared the version of the compiler used to the installed ones by rustup. I'd love to point it at a parent folder and have it search for all the places I hid a rust project.
Actually I'm now wondering if this could be done as a cargo plugin? I confess I know nothing about how cargo's internals are structured.
I would think a cargo plugin would work, if the needed information can be gained from cargo. I tried fiddling around a bit and found that:
cargo check --message-format=json (release flag for target/release)
Seems to output which build artifacts are used currently, and the fresh flag specifies whether any compiling was needed.
I could not find any timestamp from cargo when it last used an artifact, but perhaps the tool itself could maintain a list of used artifacts and timestamps which it updates each time it is run. Or just clean all unused artifacts, if maintaining this file is too problematic. Maybe it can just be stored in the target directory?
Might hack on something for this during this week if there is any interest in it :)
@alexcrichton Is there a good way to determine the last time a file in /target/.../deps/ was used, if not how hard would it be to add a timestamp file/field?
@holmgr I would definitely be interested in seeing an initial hack!
I don't think there's a great way to figure it out unfortunately other than running a full build and just seeing what was used (and considering everything else as unused). Supporting this in a first-class fashion I think will be a nontrivial undertaking!
@alexcrichton I agree that "first-class" support will be hard, hence my encouragement of "out of tree" "derty hack" experimentation.
Maybe I am missing something important, but at some point cargo has to make the list of things it needs to be on disk (or to build if they are not on disk). How hard would it be for cargo to take that list, for each item on the write a "x.timestamp" file? If all the cargos (that we care about) did that, then an out of tree tool could come along and del files associated with timestamp's that are older than 30 days. For initial testing purposes we could decide that the tool is only compatible with locally bilt cargos from someone's fork, if we are not willing to merge do to the feature freeze.
Oh small things here and there probably aren't that hard I think? We could certainly try to patch the situation in the meantime!
So I have been doing a bit of hacking on this issue to see what is possible, or not (https://github.com/holmgr/cargo-sweep).
Currently I do a full build using cargo build --message-format=json to get the artifacts used, I extract the hashes and then remove all files in target which contains this (timestamps not yet considered). This does not however seem to perform a "full" clean, since there are files in /target/.../build/ and /target/.../deps/ which are not mentioned in the build output. This made me wonder, is it only /target/.../deps/ that we are interested in cleaning up?
Since this type of solution is likely not the way we want to do this (i.e letting the tool maintain the timestamp files) I have been doing some digging around to see what else is possible.
One simple approach is to use fs::metadata to get access times for the files we are interested in, avoiding the need to generate separate timestamp files. This however would be sensitive to moving the target directory etc i guess. But this would not introduce any false positives so maybe it is the best solution.
Finally, the most complex solution is to let cargo output the timestamp files as we discussed above. I have been searching through the source a bit, but not yet found a good place to place that code (partially because I am unsure of exactly which files we are interested in cleaning).
Another update :)
So I went with the second approach outlined in my comment above which seems to be the most straightforward solution, and it seems to be working pretty well. I also added a recursive flag so it can clean all Cargo projects below a given path like @Eh2406 suggested.
It is now published on crates.io and source is here. Hopefully this solves at least the basic needs for cleaning. Any feedback on the code, or possible extensions is very appreciated but it should probably be done as an issue on cargo-sweep rather than here.
@pietroalbini suggested a command that would clean up files needed to build a dependency but not needed to use that dependency. cargo clean-everything-except-the-minimum-needed-not-to-rebuild-deps @joshtriplett suggested that we move those files to a .cache folder or something similar to mark that they can be deleted by any gc that wants. I was wondering if cargo can just delete them when it is done with them.
The big question is what are "they"? What are the minimum files required to use a dependency?
cc some other discussion going on at https://github.com/rust-lang/cargo/issues/5885#issuecomment-445015842
I would live a feature as described above, is there any update since 2018 on ways to achieve this?
Since this type of solution is likely not the way we want to do this (i.e letting the tool maintain the timestamp files) I have been doing some digging around to see what else is possible. One simple approach is to use
fs::metadatato get access times for the files we are interested in, avoiding the need to generate separate timestamp files. This however would be sensitive to moving the target directory etc i guess. But this would not introduce any false positives so maybe it is the best solution.
File access time upkeeping is a heavy process as during each files access separate IO call is executed just to update Atime. So what happens on a lot of build hosts (at least in decent organizations) all file systems are mounted with noatime option
A garbage collector is being worked on. The tracking issue is #12633. It is first starting with global resources and then we'll be looking at the target directory.