cargo-cache icon indicating copy to clipboard operation
cargo-cache copied to clipboard

alt reg: add support to only clean up a single registry

Open matthiaskrgr opened this issue 5 years ago • 3 comments

(per-registry restriction of cleanup commands)

matthiaskrgr avatar Jun 10 '19 18:06 matthiaskrgr

I am interested in working on this feature. Is there any information, such as any preliminary design ideas or specific requirements that you have before I attempt it?

Spaceface16518 avatar Sep 30 '20 19:09 Spaceface16518

Hi! So, the stuff about the registry indices is inside src/cache. I tried to split up everything so that there is a "Supercache" (this is similar to the ~/.cargo/registry/cache directory) and a "subcache" which represents a single registry (such as ~/.cargo/registry/cache/github.com-1ecc6299db9ec823, the default crates.io registry, or an alternative registry (~/.cargo/registry/cache/dl.cloudsmith.io-0aaaba6a806079f8).

You can find the supercache and subcache traits here: https://github.com/matthiaskrgr/cargo-cache/blob/master/src/cache/caches.rs

All the subcache-structs should have a name field which is derived from the registry-directory name. IIRC it will be "github.com" for the default crates.io registry and dl.cloudsmith.io for my cloudsmith test registry.

For removing only from a specified registry I would add a new --registry= cmdline flag to specify a registry to remove. (this could also be a comma-seperated list of registries to remove, like --registry="github.com,cloudsmith"). Then --autoclean, --autoclean-expensive, --keep-duplicate-crates, --remove-dir, --remove-if-younger/older-thanas well as the clean-unref and the trim subcommands need to be tweaked to touch specified registries if the --registry flag was passed.

The commandline/clap interface is mostly defined here https://github.com/matthiaskrgr/cargo-cache/blob/master/src/cli.rs, most of the conditional logic happens in main.rs though.

The most tricky part will be to find all the places where stuff is being removed (git grep remove_) and add some kind of filter that keeps certain registries alive.

For example --autoclean is implemented here: https://github.com/matthiaskrgr/cargo-cache/blob/master/src/main.rs#L386

currently it just blindly removes ~/.cargo/registry/src and ~/.cargo/git/checkouts. We can ignore the .git repo, but we need to restrict it here to only remove a the specified registry.

the cargo_cache is just a struct holding directory paths, we would actually need the Cache here which is registry_sources_caches variable. We need to bring the SubCache trait into scope and then we can access the registries one by one via registry_sources_caches.caches().iter().for_each(|x| println!("{}", x.name())); (the .caches() returns a vec of Paths, each path is the directory of a subcache aka a registry.) This will print the names of each registry that is found. You can add .filter() to only match wanted registries and the pass these along for removal: Concept diff:

diff --git a/src/main.rs b/src/main.rs
index 1329460..9c1f7da 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -74,6 +74,7 @@ cfg_if::cfg_if! {

         // use
         use crate::cache::caches::{Cache, RegistrySuperCache};
+        use crate::cache::caches::RegistrySubCache;
         use clap::value_t;
         use std::process;
         use std::time::SystemTime;
@@ -384,26 +385,28 @@ fn main() {
     }

     if config.is_present("autoclean") || config.is_present("autoclean-expensive") {
-        // clean the registry sources and git checkouts
-        let reg_srcs = &cargo_cache.registry_sources;
+        // clean the registry sources and git checkoutsname
+        //  let reg_srcs  = &cargo_cache.registry_sources;
         let git_checkouts = &cargo_cache.git_checkouts;

+        // collect the registries to remove,
+        let mut paths_to_be_removed: Vec<&std::path::PathBuf> = registry_sources_caches
+            .caches()
+            .iter()
+            /*.filter( only match wanted registries )*/
+            .map(|subcache| subcache.path())
+            .collect();
+
+        paths_to_be_removed.push(git_checkouts);
+
         // depending on the size of the cache and the system (SSD, HDD...) this can take a few seconds.
         println!("\nClearing cache...");

-        for dir in &[reg_srcs, git_checkouts] {
-            let size = cumulative_dir_size(dir);
-            if dir.is_dir() {
-                remove_file(
-                    dir,
-                    config.is_present("dry-run"),
-                    &mut size_changed,
-                    None,
-                    &DryRunMessage::Default,
-                    Some(size.dir_size),
-                );
-            }
+        for dir in paths_to_be_removed {
+            println!("removing: {:?}", dir.display());
+            /* remove */
         }
+        return;
     }

     if config.is_present("keep-duplicate-crates") {

It would make sense to also have a command which lists all registries and their names cargo cache --list-registries? at some point. Oh and currently the function that tries to figure out the name of a cache/registry removes the hash from the git repo (git grep "fn get_cache_name"). In order to disambiguate between different registries on the same host, it should probably be modified to not throw away the hash.

I hope this helps and is not too confusing, feel free to bombard me with questions. :sweat_smile: :D

matthiaskrgr avatar Oct 01 '20 19:10 matthiaskrgr

No, it's not confusing at all! You've done all the thinking for me!

I'll work on this and make a PR in a couple of days.

If I have any questions while I'm working on it, I'll ask them here until I make the PR.

Spaceface16518 avatar Oct 06 '20 15:10 Spaceface16518