nix icon indicating copy to clipboard operation
nix copied to clipboard

Have garbage collection leave store paths until they've been dead for a given amount of time

Open tejing1 opened this issue 2 years ago • 19 comments

Is your feature request related to a problem? Please describe. It's hard to decide how often to schedule garbage collection.

If you set it too long, too much builds up in your nix store.

If you set it too short, you lose caching on transient shells too often and find yourself downloading the same flake commits constantly.

In fact, no matter how long you set it, it sometimes deletes what you were just using 5 minutes ago.

Describe the solution you'd like I'd like nix to keep track of how much time has passed since a store path "died" (lost all gcroots keeping it in the store), and be capable of garbage collecting only store paths for which that amount of time is over a user supplied value. Ideally this would not be an approximation based on polling, but exact, and kept up to date by the nix daemon, at least in multi-user setups. However, if it is based on polling, transient commands (such as nix run) should also record the momentary "liveness" directly even if the polling window doesn't hit them.

With that, you could set nix to garbage collect daily, but only delete objects that have been consistently dead for more than, say, 2 weeks. Dev shells and other transient phenomenon that you use often would never be garbage collected, but outdated objects would still disappear in a reasonable amount of time.

It would be somewhat unintuitive for old generations to actually still be in the nix store for quite a while after their profile symlinks disappear, but this could also be fixed by having 2 classes of gcroots. One of which incurs the wait period before deletion, the other of which does not. It's debatable whether that's worthwhile, however.

Describe alternatives you've considered #2793 is more easily implemented, but wouldn't match what users really want from a cache expiry system as well.

Additional context Would help significantly with #4250 I brought it up earlier here: https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/15

Priorities

Add :+1: to issues you find important.

tejing1 avatar Jan 09 '23 17:01 tejing1

I've really wanted an LRU option for GC that would age things out of the store.

kjeremy avatar Jan 09 '23 22:01 kjeremy

We were just talking about this today.

Ideally this would not be an approximation based on polling

That is doable for leaf dead objects, but not so easy for dead objects that are referenced by other dead objects. But I think this is OK. Polling as part of "auto gc" should be fine.

Ericson2314 avatar Feb 13 '23 23:02 Ericson2314

I've been playing with some ideas around this over at https://github.com/risicle/nix-heuristic-gc

risicle avatar Mar 28 '23 21:03 risicle

I was just thinking about this again, and I realized it's actually pretty simple to implement. I had been thinking in terms of tracking the "time since death" of store paths, which sounds hard, but you only actually need to track the "time since death" of gcroots.

Here's an outline of how it could work:

  • Add 2 directories, /nix/var/nix/rootexpiry/current and /nix/var/nix/rootexpiry/old
    • rootexpiry/old contains symlinks to store paths
    • The symlinks in rootexpiry/old are named after the time at which the roots pointing to those store paths disappeared. (represented as seconds since the epoch, say)
    • rootexpiry/current contains symlinks that point to gcroots (sort of like how gcroots/auto has symlinks pointing to result symlinks out in the filesystem)
    • The symlinks in rootexpiry/current are named after the store paths the gcroots point to. (so that we still have that information when the gcroot itself disappears or changes)
  • The nix daemon watches (with inotify on linux, say) the rootexpiry/current directory, as well as the targets of all the symlinks inside.
    • When one of the symlink targets disappears or changes target, the nix daemon deletes the symlink in rootexpiry/current and creates a corresponding entry in rootexpiry/old named after the current time. (actually, to avoid race conditions, do it in the other order)
    • In single-user installs, nix commands need to check and update rootexpiry/current whenever they happen to get run, since there's no daemon to watch it. This means the apparent time of expiry is pushed forward to the next nix command run, at worst, and is exactly correct, still, so long as nix commands are used to manipulate gcroots.
  • Transient commands such as nix run register their own entries in rootexpiry/old automatically.
  • Garbage collection can delete symlinks in rootexpiry/old whose names encode a time older than the timeframe specified by the user and then treat any remaining entries as gcroots, keeping them present when it does collection. It also needs to treat the store paths encoded in the names of symlinks in rootexpiry/current as gcroots, to avoid any race conditions.
  • To better capture the timing information of all store objects, nix would also need to add entries in rootexpiry/old for the build dependencies whenever it builds a derivation (as opposed to substituting it), successfully or not. This ensures that build dependencies that are not runtime dependencies stick around for a sufficient amount of time.
  • Old generations of profiles can be exempt from tracking in rootexpiry/current, making them also exempt from the wait time after their deletion.

tejing1 avatar Oct 29 '23 16:10 tejing1

Leaving until dead for a while would be nice, but I would also be pretty happy with just a minimum age before deletion. I can imagine nix-collect-garbage --min-path-age 30d that only deletes paths that were added more than 30d ago. /nix/var/nix/db/db.sqlite already has ValidPaths.registrationTime which should be enough to implement this. Just skip deleting paths that are newer than the configured age.

This would give me a pretty great setup Run nix-collect-garbage --delete-older-than 30d --min-path-age 30d on a schedule to avoid collecting too much old data over time and configure min-free + max-free to do a more eager collection when space is limited.

Maybe as a bonus --max-freed and max-free could order deletions by the registrationTime as well.

kevincox avatar Aug 06 '24 00:08 kevincox

@kevincox It would be really nice if there was a "ValidPaths.lastReferencedTime" which is updated whenever a path is part of a "nix build" or "nix copy". I don't particularly care when a package was registered in store, I care when it was last needed.

Lillecarl avatar Oct 24 '25 23:10 Lillecarl

Yeah, I'm just worried that ends up a very hard to understand set of requirements. It is moving partway to "last access" but not fully. But it is probably better than nothing. And we could potentially add more "bump triggers" in the future as it makes sense.

kevincox avatar Oct 24 '25 23:10 kevincox

Just chiming in to also voice my support for such a feature (either atime or store registration time based). Additionally, if this is OK with the maintainers I would be willing to implement / prototype such a feature myself, and PR it upstream once done - please let me know whether this is a feature where the maintainers would be fine with accepting an external contribution. :)

Popax21 avatar Nov 28 '25 14:11 Popax21

External contributions have never been disallowed. Send a PR if you want.

eclairevoyant avatar Nov 28 '25 15:11 eclairevoyant

External contributions have never been disallowed. Send a PR if you want.

I'm aware; however, I still tend to ask whether a feature would actually align with a project's intended direction before I spend too much time implementing/polishing it, especially if it's my first time contributing. So if there are any caveats / catches that would prevent such a feature from being merged, or any other reasons why this hasn't been implemented up until now I would like to know before I waste both my and the maintainer's time.

Popax21 avatar Nov 28 '25 16:11 Popax21

any other reasons why this hasn't been implemented up until now

With nix generally the reason is that no one found it important enough for their own usecase. It's the problem of "good enough" stalling future improvements, especially when other larger bugs and features get more attention.

If this was actually outside project scope, this would've been closed years ago, as the nix team is aware of this FR.

Still, I'd expect some level of feedback and possibly bikeshedding once a PR is created (since a working example is easier to give feedback on).

eclairevoyant avatar Nov 28 '25 17:11 eclairevoyant

Thanks for elaborating. I was mostly just trying to check whether there might have been a previous implementation attempt that got stuck because e.g. the problem space is complexer than it appears than from the outside - I'm obviously not demanding that anyone else other than me shoulf have worked/will work on this as well.

Regardless, with this out of the way I will move onto prototyping/implementing this feature myself now, and I will PR my results once I'm done. As you correctly pointed out it's easier to have any potentially required follow up discussions/bikeshedding once there's some baseline code to work off of.

Popax21 avatar Nov 28 '25 18:11 Popax21

I found it important enough for my own usecase bit not important enough to engage with the community.

https://github.com/Lillecarl/lix/commit/9ac72bbd0c7802ca83a907d1fec135f31aab6d24

It's dirty and runs an "expensive" query that updates paths, dependencies and build-time dependencies whenever(ish) something references a path over the daemon protocol. It's intended for nix-csi where I turned Nix into an LRU cache with this so I've only tested nix copy and Nix build pretty much, the query should probably be ran from more opcodes and also on local operations (which would require a lot of plugging the query since we can't reuse something like "isvalidpath" since GC then would update regtime of the entire store.

"Good enough", I don't know C++ (wrote some 15 years ago) and it took me two days to figure this out from 0 so it's definitely up for grabs, I'm hesitant to put down the effort because of mentioned bikeshedding and stagnancy, it's easier to carry patches, I commend anyone who puts down effort to upstream things. If someone does I'll help out with reviewing to my capacity.

Lillecarl avatar Nov 29 '25 03:11 Lillecarl

Thanks for your insight!

It's dirty and runs an "expensive" query that updates paths, dependencies and build-time dependencies whenever(ish) something references a path over the daemon protocol.

That's one way of implementing something like this that I've also considered; however, for practicality's sake my prototype will only focus on time sources which are easily accessible without any DB modifications (i.e. atime / store registration time). Currently I envision the feature as an --older-than CLI option you can pass to nix-collect-garbage / nix store gc, which by default uses store registration time, but can be switched over to atime using another flag like e.g. --use-atime (which will error if atime isn't available). Having the DB track the last access time and using that instead of the store registration time would definitively be cleaner, but as said for now I would be focusing on getting an MVP prototype made before too much effort gets sunk into a concrete implementation.

Popax21 avatar Nov 29 '25 10:11 Popax21

@Popax21 https://github.com/risicle/nix-heuristic-gc for atime GC :)

Using registrationTime as a "GC gate" is "free". You can pipe nix-store --gc --print-dead to nix path-info --json then extract and sort on registrationTime then run nix store delete :) Not saying it can't go into Nix proper but it doesn't really "need" to be in to POC it, easily done with external tools :)

Lillecarl avatar Nov 29 '25 13:11 Lillecarl

I'm aware; however, that first project is limited to collecting a fixed amount of garbage each invocation, and the second option would still require some external tool to handle plumbing all this information around (and it would probably be less efficient than a native GC gate). However, if this was a feature that upstream Nix wasn't interested in this is exactly what I would be settling on for my own needs; evidently tho there seems to be at least some interest in the feature, which is why I'll give upstreaming a "proper" implementation a go.

Popax21 avatar Nov 29 '25 13:11 Popax21

that first project is limited to collecting a fixed amount of garbage each invocation

nix-heuristic-gc 999TB ;)

FWIW have just added the ability to restrict collection to non/only invalid/substitutable/.drv paths

I've had a few extra thoughts about use of atime for this, but haven't gone anywhere near implementing them. As noted in the help for the --inherit-atime option, simply looking at the atime for a path at time of activation misses a lot of information - a path that, itself, may appear to not have been needed for ages according to the atimes, may have been needed (but not itself accessed) quite recently, though its depending path may have now been removed. --inherit-atime attempts to address this but can't "remember" this atime inheritance information between invocations. You'd need to keep some sort of "atime observatory" database to address this properly. Unless of course you use nix's root powers to manually propagate "fake" atimes from children to parents, presumably at the time of collecting the child. I haven't gone any deeper into this because I'm already at the edge of trying to solve problems nobody else really cares about.

(Also remember you've got to ignore the atime of directories because you'll update them yourself just by walking the tree)

risicle avatar Nov 29 '25 15:11 risicle

nix-heuristic-gc 999TB ;)

That would AFAICT still collect all garbage, since IIRC there's no way to enforce a minimum age for collection.

Unless of course you use nix's root powers to manually propagate "fake" atimes from children to parents, presumably at the time of collecting the child.

That is my plan; Nix's GC already pulls liveness info from a lot of places (like e.g. the cmdlines of currently running processes). It should hopefully not be too complicated to tap into this logic to keep the entire runtime closure of any too young paths alive.

(Also remember you've got to ignore the atime of directories because you'll update them yourself just by walking the tree)

That's what O_NOATIME is for :)

Popax21 avatar Nov 29 '25 18:11 Popax21

nix-heuristic-gc 999TB ;)

That would AFAICT still collect all garbage, since IIRC there's no way to enforce a minimum age for collection.

Correct, no explicit atime filter support.

That's what O_NOATIME is for :)

Ah the powers of root.

risicle avatar Nov 29 '25 19:11 risicle