cargo-cache
cargo-cache copied to clipboard
alt reg: add support to only clean up a single registry
(per-registry restriction of cleanup commands)
I am interested in working on this feature. Is there any information, such as any preliminary design ideas or specific requirements that you have before I attempt it?
Hi!
So, the stuff about the registry indices is inside src/cache
.
I tried to split up everything so that there is a "Supercache" (this is similar to the ~/.cargo/registry/cache
directory) and a "subcache" which represents a single registry (such as ~/.cargo/registry/cache/github.com-1ecc6299db9ec823
, the default crates.io registry, or an alternative registry (~/.cargo/registry/cache/dl.cloudsmith.io-0aaaba6a806079f8
).
You can find the supercache and subcache traits here: https://github.com/matthiaskrgr/cargo-cache/blob/master/src/cache/caches.rs
All the subcache-structs should have a name
field which is derived from the registry-directory name. IIRC it will be "github.com" for the default crates.io registry and dl.cloudsmith.io
for my cloudsmith test registry.
For removing only from a specified registry I would add a new --registry=
cmdline flag to specify a registry to remove. (this could also be a comma-seperated list of registries to remove, like --registry="github.com,cloudsmith"
).
Then --autoclean
, --autoclean-expensive
, --keep-duplicate-crates
, --remove-dir
, --remove-if-younger/older-than
as well as the clean-unref
and the trim
subcommands need to be tweaked to touch specified registries if the --registry
flag was passed.
The commandline/clap interface is mostly defined here https://github.com/matthiaskrgr/cargo-cache/blob/master/src/cli.rs, most of the conditional logic happens in main.rs though.
The most tricky part will be to find all the places where stuff is being removed (git grep remove_
) and add some kind of filter that keeps certain registries alive.
For example --autoclean
is implemented here: https://github.com/matthiaskrgr/cargo-cache/blob/master/src/main.rs#L386
currently it just blindly removes ~/.cargo/registry/src
and ~/.cargo/git/checkouts
.
We can ignore the .git repo, but we need to restrict it here to only remove a the specified registry.
the cargo_cache
is just a struct holding directory paths, we would actually need the Cache
here which is registry_sources_caches
variable.
We need to bring the SubCache trait into scope and then we can access the registries one by one via registry_sources_caches.caches().iter().for_each(|x| println!("{}", x.name()));
(the .caches()
returns a vec of Paths, each path is the directory of a subcache aka a registry.)
This will print the names of each registry that is found.
You can add .filter()
to only match wanted registries and the pass these along for removal:
Concept diff:
diff --git a/src/main.rs b/src/main.rs
index 1329460..9c1f7da 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -74,6 +74,7 @@ cfg_if::cfg_if! {
// use
use crate::cache::caches::{Cache, RegistrySuperCache};
+ use crate::cache::caches::RegistrySubCache;
use clap::value_t;
use std::process;
use std::time::SystemTime;
@@ -384,26 +385,28 @@ fn main() {
}
if config.is_present("autoclean") || config.is_present("autoclean-expensive") {
- // clean the registry sources and git checkouts
- let reg_srcs = &cargo_cache.registry_sources;
+ // clean the registry sources and git checkoutsname
+ // let reg_srcs = &cargo_cache.registry_sources;
let git_checkouts = &cargo_cache.git_checkouts;
+ // collect the registries to remove,
+ let mut paths_to_be_removed: Vec<&std::path::PathBuf> = registry_sources_caches
+ .caches()
+ .iter()
+ /*.filter( only match wanted registries )*/
+ .map(|subcache| subcache.path())
+ .collect();
+
+ paths_to_be_removed.push(git_checkouts);
+
// depending on the size of the cache and the system (SSD, HDD...) this can take a few seconds.
println!("\nClearing cache...");
- for dir in &[reg_srcs, git_checkouts] {
- let size = cumulative_dir_size(dir);
- if dir.is_dir() {
- remove_file(
- dir,
- config.is_present("dry-run"),
- &mut size_changed,
- None,
- &DryRunMessage::Default,
- Some(size.dir_size),
- );
- }
+ for dir in paths_to_be_removed {
+ println!("removing: {:?}", dir.display());
+ /* remove */
}
+ return;
}
if config.is_present("keep-duplicate-crates") {
It would make sense to also have a command which lists all registries and their names cargo cache --list-registries
? at some point.
Oh and currently the function that tries to figure out the name of a cache/registry removes the hash from the git repo (git grep "fn get_cache_name"
). In order to disambiguate between different registries on the same host, it should probably be modified to not throw away the hash.
I hope this helps and is not too confusing, feel free to bombard me with questions. :sweat_smile: :D
No, it's not confusing at all! You've done all the thinking for me!
I'll work on this and make a PR in a couple of days.
If I have any questions while I'm working on it, I'll ask them here until I make the PR.