cue icon indicating copy to clipboard operation
cue copied to clipboard

modules: add mod clean -modcache or similar

Open myitcv opened this issue 1 year ago • 7 comments

In the spirit of go clean -modcache: https://go.dev/ref/mod#go-clean-modcache

More detail to follow

myitcv avatar May 08 '24 11:05 myitcv

My 2c: the only reason Go ended up with flags like -cache and -modcache is that go clean used to be for cleaning build artifacts from packages. Nowadays, almost noone uses it for that purpose anymore:

Clean removes object files from package source directories.
The go command builds most objects in a temporary directory,
so go clean is mainly concerned with object files left by other
tools or by manual invocations of go build.

I don't think we will ever have a need for such a "clean package build files" command. Even if evaluating a CUE package or building its e.g. wasm artifacts were to produce some sort of build output, that would be in the cache, not in the very same package directory.

For that reason, I think we should go for something like cue clean modcache, i.e. an argument instead of a flag. Given that all of our caches live under the same directory, we could also provide cue clean allcaches to delete all of it.

It's also worth noting that "modcache" is an artifact from Go's GOMODCACHE env var, which we do not have, so we can likely come up with a better name like go clean modulecache or, as a hierarchy, go clean cache/modules.

mvdan avatar May 10 '24 14:05 mvdan

or, as a hierarchy, go clean cache/modules.

note that this prompted me to file https://github.com/cue-lang/cue/issues/3139.

mvdan avatar May 11 '24 09:05 mvdan

With #3139 implemented, I think we should support the following commands to clean ${CUE_CACHE_DIR}:

  • cue clean all - clear all caches
  • cue clean mod - clear all module caches
  • cue clean mod/download - clear all downloaded modules
  • cue clean mod/extract - clear all extracted module archives

Basically, either all to clear everything under ${CUE_CACHE_DIR}, or an argument for one of the known cache directories underneath.

(nevermind the fact that in an earlier comment I wrote go clean...)

mvdan avatar Jun 01 '24 14:06 mvdan

With #3139 implemented, I think we should support the following commands to clean ${CUE_CACHE_DIR}:

  • cue clean all - clear everything
  • cue clean mod - clear all module caches
  • cue clean mod/download - clear all downloaded modules
  • cue clean mod/extract - clear all extracted module archives

Basically, either all to clear everything under ${CUE_CACHE_DIR}, or an argument for one of the known cache directories underneath.

(nevermind the fact that in an earlier comment I wrote go clean...)

according to https://github.com/cue-lang/cue/issues/3139, this should have been part of v9.0.0 ? right ? but the cue clean all command seems not known to cue v0.9.2 ? ( or am I missing something? )

ysmaoui avatar Jul 30 '24 09:07 ysmaoui

@ysmaoui this issue is still open - cue clean has not been added as a command yet.

mvdan avatar Jul 30 '24 09:07 mvdan

another remark:

cue-cache

I am trying to cleanup the cache manually ( by deleting the content of the cache folder ) , but it seems that all files of the modules stored in the cache have permissions that block the deletion

is this a known issue?

ysmaoui avatar Jul 30 '24 09:07 ysmaoui

Yes, the files are marked read-only to discourage users from directly modifying them. This is why we want to add a cue clean command - to make it easy to clear the cache entries without needing rm -rf.

mvdan avatar Jul 30 '24 09:07 mvdan

I'm going to +1 this with some details, because the module cache has been a thorn in my side for multiple reasons.

On macOS, directories and files where the effective uid and gid do not have +w will stubbornly refuse to be removed even with rm -rf, you have to chmod -R 0700 the entire cache directory first before you can remove them. I understand why the cache is marked as read-only, but I also think going out of the way to remove the write bit on every file and directory is really poor UX.

While a clean modcache subcommand would certainly be useful, when cue is used as a library that's similar functionality that must be written by developers. Sure, you can walk the tree, chmod everything in code, and then do a recursive delete, but just like the experience from an end-user of the CLI is poor, the DX of the library suffers - especially for some uses cases I've been playing with where I may be creating transient caches because I'm expecting custom registry configs to be provided (and assuming the same module may in fact be the same module between different configs would be wrong at best, dangerous at worst).

Looking at half a dozen folders I have under ~/Library/Cache not a single application, except cue, has gone out of its way to remove the write bit from the file owner. At this point, I think the better solution is not trying so hard to remove the footgun from the user, instead of creating a dedicated subcommand to work around a deliberate design decision that even web browsers do not make.

snuxoll avatar Apr 17 '25 01:04 snuxoll

@snuxoll we borrowed this idea from Go - out of curiosity, do you see issues on MacOS with the Go cache directories as well?

mvdan avatar Apr 17 '25 08:04 mvdan

@mvdan Yes, actually

➜  cuelang.org rm -rf [email protected] 
rm: [email protected]/cmd/cuepls/internal/test/integration/base/base_test.go: Permission denied
rm: [email protected]/cmd/cuepls/internal/test/integration/base: Permission denied
rm: [email protected]/cmd/cuepls/internal/test/integration: Permission denied
rm: [email protected]/cmd/cuepls/internal/test: Permission denied
rm: [email protected]/cmd/cuepls/internal: Permission denied
rm: [email protected]/cmd/cuepls/main.go: Permission denied
rm: [email protected]/cmd/cuepls: Permission denied
rm: [email protected]/cmd/cue/cmd/custom.go: Permission denied
rm: [email protected]/cmd/cue/cmd/completion.go: Permission denied

To add, go somewhat gets away with it because:

  1. There is an implicit assumption that modules are stored in a VCS, with some default rules around popular software forges and heuristics to try and guess how to get a module, with the <meta> tag to handle vanity URLs.
  2. The public go module proxy ensures that unless you've gone out of your way to configure GOPRIVATE, versions of publicly available modules are guaranteed to be more or less immutable and the underlying source can vanish without a trace and code still using it will continue to build and work.

CUE opting for OCI registries fits the requirements of the language much better IMHO, but outside of the central registry there is nothing that guarantees an import path will consistently map to a specific instance of a module. Quite frankly, I'd make the argument that trying to have a centralized cache on disk is almost the wrong choice, or that at a minimum they should be addressed not only by the import path but the registry prefix they were pulled from, but it's not like that problem doesn't exist in other ecosystems that do not have the tight mapping between import paths and the actual package that Go does (there's nothing stopping one putting a package in a private Maven repository with the same coordinates as one on central and causing chaos in the office, for example).

snuxoll avatar Apr 17 '25 08:04 snuxoll

It's true that different registries may serve different source for the same module path and version, but in practice that should be rare and discouraged, given that module import paths are a global namespace by design. Commands like cue mod mirror exist to copy modules from one registry to another, for example.

I think it would be too inefficient for the vast majority of users to have a per-registry module cache. I suspect one of the most common situations will be a work laptop using a mix of registry.cue.works and registry.mycorp.com, where the corporate registry could be mirroring some public modules for the sake of redundancy or network firewalls.

In any case, if we wanted to protect users against the accidental mixing of different registries who disagree on the sources of modules, we could always introduce trust-on-first-use checksums, similar to Go's go.sum file.

mvdan avatar Apr 17 '25 08:04 mvdan

@mvdan I'm in full agreement that part of the design is more or less Good Enough(TM), although something akin to go.sum (just use image digests?) would be a fantastic addition - I'm just stating that, while import paths are a "global" namespace, there's large differences in the guarantees provided by the design of Go package resolution vs what Cue does, combined with the many use cases where you may be loading Cue modules provided by a user vs where you might do such things with Go.

And that's where I feel simply making the module cache easier to clean up, whether via just using rm -r or os.RemoveAll by not removing the write-bit for the owner pretty much deals with any of the potential edge cases where such issues may occur: just create a new cache directory and delete in when you're done. It's not a hill I'd die on, but walking the tree twice because not every operating system allows the owner of an inode to delete it without a write bit is a bit silly.

snuxoll avatar Apr 17 '25 09:04 snuxoll