cargo-cache
cargo-cache copied to clipboard
clean-unref: documentation & feat suggestion (recursivity)
Hi!
About documentation
cargo cache -h
...
clean-unref remove crates that are not referenced in a Cargo.toml from the cache
When I read this I guess that it looks around for Cargo.toml
to list all the dependencies and remove data in the cache that are not used by any of them. In the README is mentioned:
- purge cache entries not unused to build a specified crate (
cargo cache clean-unref
)
Then I understand that it does this only for the crate I am in. When I run the command, from my home directory, I get:
% cargo cache clean-unref
...
Failed to Cargo.toml manifest in /home/***** or downwards.
So I think the help description for this subcommand could be improved (for instance copying the README)
The suggestion
This feature as I currently understand it removes unused crates (but not binaries) that are not listed in the Cargo.toml
of the crate I am currently in, therefore ignoring all the other crates. I think it can be good for CI use, but for personal use, I think that it could be interesting to have a command like clean-unref-user
which (Rust-like pseudocode):
-
let used =
Iterates over allCargo.toml
in user directory recursively (can take some time) -
.map(|file| file.
get the list of the dependencies()).flatten();
- Get the list of the repositories
-
for dependency in used {
removedependency
from the list of repositories (skip if already removed or missing)}
- The remaining repositories are not used so can be deleted.
clean-unref
is supposed to run inside a rust project (git repo or crate or whatever), so yeah, I need to improve the docs and perhaps the error a bit, thanks for pointing that out!
Yes, cargo-cache
only touches any sort of crate sources, I don't want it to do anything with cargo install
ed binaries, I think that is out of scope.
Your suggestion is another clean-unref
which walks the current directory (cwd
) -tree and runs clean-unref
not only a single crate but on all found rust projects that were found, as if they were a single project, did I understand that correctly?
Yes this is the idea: not keeping only the cache for dependencies of a single project, but for the union of the dependencies of all the projects of the user.
Also, I was thinking to do it user-wise to not miss dependencies, so instead of starting from cwd
, start from the user home directory. If cwd
were as a start point by default, if the user runs cargo cache clean-unref-user
(or other given name) while in a crate, it would be the equivalent of cargo cache clean-unref
and delete caches for dependencies of all the other projects of the user (unless they are also dependencies of the current project), which is probably not what the user wanted to do. Maybe the subcommand could take an optional argument which could be the start point, so the user could still start from cwd
by adding .
as an argument to the subcommand.
Side note: For reference, I would like to highlight why it is different from a simple shell script which would run cargo cache clean-unref
in each found project. Doing so would leave in the cache only the dependencies that are in all the projects of the user, so probably nothing. The expected behavior is to retain the union of the dependencies, not the intersection.
Right, so we could have something like cargo-cache clean-unref -r <path> / --recursive <path>
where we can pass a directory or just "." for the cwd.
Some notes on how that could be implemented:
Use walkdir on path
and crawl all the directories and search for cargo.toml
s, collect them into a vec.
rayon par_iter()
(I think we still don't have a better walkdir -> par_iter conversion today) over the vec and try to open the respective rust projects with cargo-metadata
.
If we fail (for example if there is just some Cargo.toml of a test of a crate that is not a valid crate source) silently ignore it.
Make cargo-metadata query each project for a list of dependencies (the logic already exists and just need to be refactored a bit I hope).
Note: It might be that cargo-metadata extracts/downloads all the required dependencies in this steps, so the par_iter
may not pay out a lot actually...? need to check that (the .cargo
-cache might be locked for one crate at a time for this)
collect all the dependencies into one huge list and then prune all other entries from the cache (as we already do today, but only for a single crate.
I am not sure keeping it in the same subcommand is good for the user because if the user forgets the -r
then almost everything will have to be downloaded again. I think that -r
is often used where commands are more destructive with it than without it, there it is the opposite: omitting -r
is more destructive than adding it.