mage icon indicating copy to clipboard operation
mage copied to clipboard

Mage Cache Size Management

Open natefinch opened this issue 3 years ago • 6 comments

Someone at work was complaining that their .magefile directory was getting bloated with old binaries... which is a completely valid complaint. We should add autocleanup to delete old binaries in the magefile directory.

Something like

MAGEFILE_MAX_CACHE_AGE - number of days. Default... 10? Binaries in the mage cachedir that are older than this would get deleted whenever mage runs.

MAGEFILE_AUTOCLEANUP - default true, if true run autocleanup whenever mage runs (see max cache age)

MAGEFILE_MAX_CACHE_SIZE - number of bytes. Default... 200? If the mage cache size is larger than this, delete the oldest binaries in the directory until it goes under this number.

MAGE_NOCACHE - default false. if true, never cache binaries (i.e. delete them after running). This would mean you always rebuild the binaries, which can be slower.

natefinch avatar Oct 23 '20 15:10 natefinch

While promoting BDD at work, would we benefit from BDD description of new features of change requests here as well? So that along the WHAT we could also see and capture the WHO (is it for) and WHY (do we bother)? I would like to see the scenarios for the above 4 env vars, because in my case I came up with something like this:

Story As a developer I want to ensure mage keeps a reasonably small and predictable disk space footprint So that my system does not run out of disk space while using mage

Scenario - mage cache directory Given my project uses mage When I compile the project Then a ~/.magefile directory cache is created if not exists

Scenario - cache binaries for a project Given my project uses mage When I compile the project Then a compiled mage version is created in ~/.magefile directory And the compiled mage version name is a hash (created how?)

Scenario - delete previous cached binaries for a project Given my project uses mage When I compile the project Then previously compiled mage versions are delete from the ~/.magefile directory

I'm trying to figure out in which scenarios would someone want to keep more than the one, last, binary per project? Maybe I'm missing something but for me, just having the MAGE_NOCACHE would work (and default it to false), i.e. keep only the last binary and delete any old ones by default. This would require mage could find out which binary relates to a project.

If I had 2 projects

a
- mage.go

b
- mage.go

Would mage generate two binaries in the cache? For simplicity let's call them a1 and b1. If I modify the project a, I would probably want mage to delete the previous a1 binary and keep a new generated a2. While There still would be the b1 file because I haven't touched the project b. So unless I've misunderstood something, I wouldn't want the cache to depend on age or size.

mirogta avatar Oct 25 '20 07:10 mirogta

So, the whole reason why mage caches the binary is because it can be a little slow to rebuild the binary every time (only slow in the sense of wanting instant reaction time when you're running a bunch of CLI commands over and over).

Now, the way I decided to connect the stored binary to the code is to hash all the magefiles together and use that as the name. So, if the magefile code changes, it'll use a different binary name, and we won't know what the old name was, so we can't just remove the old binary. So what happens is that all the old binaries just sit there.

We could, of course, change this, and store them based on the path they're in, and then we'd be able to overwrite the old one when we make a new one. That's probably a good optimization.

Now as to why you'd want to limit the cache based on age or size is because you might use a magefile from a project just once and then never need it again, but it would be sitting in your cache for forever.

natefinch avatar Oct 25 '20 12:10 natefinch

Cool, got it.

Following this through, if I ran mage only once, it would create the binary file in the cache. So if I then changed my source code but not magefile code, the binary from the cache would be reused. Cool.

But is that the case that if I would never run mage again, the cached binary would stay in the cache forever regardless of the mage settings, because mage wouldn't run? Scenario - when contributing to an open source project: I make changes, a magefile cache binary gets created, I commit & push, raise PR, delete the project's source code, but .magefile cache would stay on my system forever?

Should the cleanup then be handled via a separate process (cron?) so it wouldn't depend on me running mage?

I can imagine a scenario where if a new file is added to the cache, a crontab record is also added to clean it up (if not found yet in crontab), and if in an effect the last binary is removed from the cache then the crontab record is also removed to keep the system clean.

Of course, crontab or other implementation is system dependant, bah on Windows (Windows Service to the rescue? not trivial). And it complicates things... and we don't like complications.

mirogta avatar Oct 25 '20 16:10 mirogta

If mage were to touch binaries when running them, then doing automatic cleanup based on modification time would be pretty simple. It wouldn't address the "ran mage once and never again for any project", but in that case, then you probably don't have much cache to clean up :)

Even if it didn't update the mtimes and just deleted any cached binaries over a certain age (other than the one it just ran), the go build cache would make the penalty for the extra deletes relatively small.

mgabeler-lee-6rs avatar Jul 22 '21 05:07 mgabeler-lee-6rs

Mage could use a project-local directory added to .gitignore.

.magefile
├── sum.json
└── mage

sum.json

{
  "binary": "h1:KSri/1RMQOZLbw7AHqgcBycp8pgJnQMYYT8QZRqZ1Ao="
  "source": "h1:vmdkHvce7UzX6xkyf4cca8WlwdQ5RQr8fzta+xl7BOM="
}
  1. Hash the binary
  2. Hash the source files

Compare both to that in sum.json, and rebuild if there are any differences.

You could use a hash function similar to what go uses (Hash1):

https://github.com/golang/mod/blob/ce943fd02449f621243c9ea6e64098e84752b92b/sumdb/dirhash/hash.go

You could probably get mage to be even faster by not hashing everything, and instead just focusing on some lighter details like:

  • using MD5 or SHA1 for hashing (for higher performance)
  • using several of the fields from io.FileInfo
    • Name (use absolute path)
    • ModTime
    • Size

This would probably be super super fast, and solve the caching problem as well.

https://cs.stackexchange.com/questions/19042/fast-hashing-combination-of-different-techniques-to-identify-changes-in-a-file

ghostsquad avatar Jun 30 '22 07:06 ghostsquad

I added a PR using the approach recommended by @mgabeler-lee-6rs, I didn't follow the proposal from @ghostsquad because I think this is performant enough for the use case and at the same time that changes would implied a more profound set of changes. Also I didn't follow the suggestion of a "cron based process" from @mirogta because it adds a lot of complexity because you have to interact with different operating and how they deal with that kind of things, and at the same time, it generate a weird side effect in the system that as a user I wouldn't expect.

Any suggestion or comment to the PR is welcome :)

jespino avatar Aug 05 '22 12:08 jespino