rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

[RFC 0152] `local-overlay` store

Open Ericson2314 opened this issue 1 year ago • 46 comments

Add a new local-overlay store implementation to Nix. This will be a local store that is layered upon another local filesystem store (local store or daemon). This allows locally extending a shared store that is periodically updated with additional store objects.

This work is sponsored by Replit :sparkles:

Ericson2314 avatar Jun 14 '23 14:06 Ericson2314

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/super-colliding-nix-stores/28462/17

nixos-discourse avatar Jun 17 '23 14:06 nixos-discourse

What if, instead of requiring that the lower store grow monotonically, the overlay store maintained GC roots in the lower for any lower paths it references? I haven't thought this through to the level of detail in the spec, but it strikes me as an alternative worth considering; not being able to run garbage collection on the lower store would certainly be a barrier for personal use (perhaps not a central goal for this RFC) and possibly for org use as well—storage is cheap but it doesn't round down to free in every context.

rhendric avatar Jun 17 '23 16:06 rhendric

I work for Replit (who sponsored work on this RFC) and helped out with the creation of this RFC. I am happy to serve as a Shepherd on this RFC, but also happy to cheer from the sidelines if people (or the Steering Committee) see this as a conflict of interest.

ryantm avatar Jun 20 '23 20:06 ryantm

I'm generally excited about this. I have a slightly different target than Replit has. I'm shipping firmware image (Multiple GB -large firmware images) with a nix/store that is immutable and signed (via dm-verity and secureboot). The lower overlay ships with a document that provides a way to rebuild the db with everything that was put in the store, at runtime the db is rebuilt. Configuration is then evaluated and a new system is built in the upper store.

This would solve the initial import of the DB which is still not that fast (although probably way faster than importing 16TB worth of derivations!). For our use-case, I still believe we'll trash the upper layer at each boot because it's easier to reason about.

baloo avatar Jun 21 '23 02:06 baloo

@rhendric That is useful for some things, but probably not the use-case of large numbers of consumers all sharing the same underlying store --- it is pretty important the underlying store be truely read only in that case, including any GC roots.

Ericson2314 avatar Jun 27 '23 11:06 Ericson2314

@baloo We have separately thought about those sorts of issues, including a persistent upper store that then "pivots" onto a new lower store when one does can upgrade of NixOS (and I suppose GC of the old generations). I think the pivoting feature is a nice future work item.

Ericson2314 avatar Jun 27 '23 11:06 Ericson2314

This RFC is now open for shepherd nominations!

kevincox avatar Jun 28 '23 13:06 kevincox

I nominate @roberth

Ericson2314 avatar Jun 28 '23 14:06 Ericson2314

That is useful for some things, but probably not the use-case of large numbers of consumers all sharing the same underlying store

Now this makes me wonder if local-overlay sounds a bit more local than this use-case (from a naming point of view)…

7c6f434c avatar Jun 29 '23 07:06 7c6f434c

I suppose would deprecate all "local" and "remote" as not being misleading names. I just picked local-overlay on moments notice to match the existing pattern.

Ericson2314 avatar Jul 03 '23 03:07 Ericson2314

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-51/30870/1

nixos-discourse avatar Jul 24 '23 07:07 nixos-discourse

This RFC has not acquired enough shepherds. This typically shows lack of interest from the community. In order to progress a full shepherd team is required. Consider trying to raise interest by posting in Discourse, talking in Matrix or reaching out to people that you know.

If not enough shepherds can be found in the next month we will close this RFC until we can find enough interested participants. The PR can be reopened at any time if more shepherd nominations are made.

See more info on the Nix RFC process here

lheckemann avatar Jul 26 '23 13:07 lheckemann

@baloo, @rickynils, @zhaofengli, @arianvp, @edolstra would any of you be open to shepherding this along?

I don't see much controversy or drama here and, in my opinion, the document is in good shape too, so overall I expect the shepherd work to be low-commitment.

If you've never done it before, you can look at https://github.com/NixOS/rfcs/blob/master/rfcs/0036-rfc-process-team-amendment.md#shepherd-team for more information about being a Shepherd.

ryantm avatar Jul 26 '23 14:07 ryantm

I can Shepard

arianvp avatar Aug 01 '23 08:08 arianvp

I also poked the #nixos-systemd channel. As some people are looking at Appliance images / immutable nix-store partitions and it seems to have a lot overlap with this RFC

arianvp avatar Aug 03 '23 18:08 arianvp

I am auto-nominating myself as a shepherd, to be completely clear, my availability for August is limited as I will be on vacations, but I will have time back on September, though I may have other responsibilities like Release Manager that have a higher priority, if you will have me, that's fine. :)

RaitoBezarius avatar Aug 03 '23 22:08 RaitoBezarius

Self-nominate

tomberek avatar Aug 06 '23 04:08 tomberek

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-with-shared-nix-store-among-compute-nodes/33235/2

nixos-discourse avatar Sep 20 '23 14:09 nixos-discourse

Initial meeting took place on 2023-09-24

Notes

[ericson]: RyanTM discussed review+merge of PR in current form as experimental feature

[RaitoBezarius]: go with FUSE proposal - [ericson]: interaction with ACL proposal? - https://cl.tvl.fyi/c/depot/+/9449/4/tvix/tests/README.md

[ericson]: delete has similar problem as whiteout/tombstones, I do agree FUSE is probably necessary to "truly" solve that.

[RaitoBezarius]: sqlite is not longer applicable [arian]: also store nar+narinfo? - [ericson]: can layer with sqlite, would need initial load to build indexes [RaitoBezarius]: would like DB to be impl detail - NixStoreFS - [ericson]: allow building with LocalFSStore (abstract superclass), then rename to LocalSqliteStore - [raito]: blobStore, directoryStore, metadata, etc? - too much separation of processes? - current store has issues when used as a build store, leaking FDs - dmesg for the win?! - WIP to make everything in builds a bind mount, not hardlinks

[arian]: concern about what is used as the lowest layer in a standard setup. possibility to use the narinfo layer [RaitoBezarius]: deduplication in store layer? [ericson]: this is a very stateful feature, requires migration? [RaitoBezarius]: proper migration provides much more peace of mind, expectation to have store->store migration.

[ericson]: would be good to have Postgres store - s3fs, goofys? - bcachefs - potential? - TODO: get contacts from RaitoBezarius about kernel developments

[tomberek]: most of this is to help decide if current proposal is worth working on. - [arian]: local-overlay-store is somewhat of a stopgap, should not be blocked by potential of future work - code and implementation exists

[RaitoBezarius]: propose merge as experimental, get feedback - more ambitious: migration stories, encourage store layer experimentation, another RFC - proliferation of store layers that are not "friendly" wrt migrations, version, etc. - can accept this, but need to be careful - this will be used by larger organizations, thus cementing the xp features - [tomberek]: interaction with scheduling of remote builds? removes deadlock?

Experimental?

Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path. Afterward people will likely complain. We must set the expectation that this is not a stable feature.

  • TODO: discuss with ReplIt, ask community for who is interested in this as a production feature. Inivited ryantm to join and answer a few questions.
  • [ryantm]: we plan to use the PR as-is
  • [ryantm]: have been on 2.3, planning to jump forward to use this PR and pinning if necessary
  • [ryantm]: will need to support better infra in the future and support by Nix team
  • [ryantm]: local overlay store is a nice way to integrate with other underlying store implementation (tvix-store for example)

Summary

  • Merging of code is agreed upon
  • Decision: merge PR as xp, get feedback, RFC discussion continues, stabalize + accept
  • TODO: get "hands-dirty" [arian,tomberek]
  • TODO: need feedback from experimental usage
  • expectation: possible to merge within 2 months
  • reconvene after experience with merged xp feature

Misc Links:

  • https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
  • https://github.com/project-machine/puzzlefs
  • FUSE passthrough

tomberek avatar Sep 24 '23 17:09 tomberek

Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.

Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in nix.conf:

experimental-features = local-overlay-2.15

Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.

infinisil avatar Oct 03 '23 23:10 infinisil

Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.

Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in nix.conf:

experimental-features = local-overlay-2.15

Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.

I think it still doesn't address migration paths for multiple store backends, alas.

RaitoBezarius avatar Oct 03 '23 23:10 RaitoBezarius

I think it still doesn't address migration paths for multiple store backends, alas.

Not quite sure I understand the probIem statement, but 'm not sure migration of data from one layer to another needs to be part of this initial RFC. Nix is not doing the overlayfs mounts on its own either, but relies on the user of the feature to take care of this, and know the surrounding context - so it seems prudent to rely on that same user with the context to also migrate/move data around as it makes sense for their context.

flokli avatar Oct 04 '23 08:10 flokli

Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.

Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in nix.conf:

experimental-features = local-overlay-2.15

Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.

This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.

flokli avatar Oct 04 '23 08:10 flokli

I think it still doesn't address migration paths for multiple store backends, alas.

I'm also not entirely sure if I get it, but it could be required that experimental features cannot share their Nix store with stable Nix versions, so e.g. it would have to be /nix/experimental/store or so. And only provide non-destructive ways to migrate stores while it's still experimental, so e.g. "copy + migrate the store" but no "migrate the store in place".

This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.

Mostly yeah. I think it's appropriate to bring up since it was mentioned as a concern in the meeting notes, but it might not have to be a concern with this idea.

infinisil avatar Oct 04 '23 15:10 infinisil

Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in nix.conf:

experimental-features = local-overlay-2.15

Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.

This is how you end up with people advising others to do

nix.settings.experimental-features = [ "local-overlay-${lib.versions.majorMinor config.nix.package.version}" ];

It's not that people don't know experimental features might change, it's that they don't care until it happens, or don't believe it really will.

alyssais avatar Oct 04 '23 17:10 alyssais

I think it still doesn't address migration paths for multiple store backends, alas.

I'm also not entirely sure if I get it, but it could be required that experimental features cannot share their Nix store with stable Nix versions, so e.g. it would have to be /nix/experimental/store or so. And only provide non-destructive ways to migrate stores while it's still experimental, so e.g. "copy + migrate the store" but no "migrate the store in place".

This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.

Mostly yeah. I think it's appropriate to bring up since it was mentioned as a concern in the meeting notes, but it might not have to be a concern with this idea.

I think it still doesn't address migration paths for multiple store backends, alas.

Not quite sure I understand the probIem statement, but 'm not sure migration of data from one layer to another needs to be part of this initial RFC. Nix is not doing the overlayfs mounts on its own either, but relies on the user of the feature to take care of this, and know the surrounding context - so it seems prudent to rely on that same user with the context to also migrate/move data around as it makes sense for their context.

My point was about that we don't have the right primitives to ingest store paths from a backend to move away from this backend, except serializing the store and deserializing it, which cannot always work in certain scenarios of large stores.

I am okay with experimental features not having migration knobs, I am meh (or even against) about multiplying the amount of store backends and having no migration stories for them, which will hamper our ability to make them evolve.

There are a lot of things in Nix that just perform automatic migration, I think it's important we take the time to formalize this, I am not saying we need to have a good answer as part of this RFC, but someone has to care about this.

RaitoBezarius avatar Oct 06 '23 22:10 RaitoBezarius

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/building-qemu-kvm-vms/33149/20

nixos-discourse avatar Oct 09 '23 07:10 nixos-discourse

There's a new method that landed in the kernel that allows you to "tuck" the lower store. Does that address these things? https://lwn.net/Articles/927491/

On Tue, Oct 31, 2023, 06:31 ballit6782 @.***> wrote:

@.**** requested changes on this pull request.

I have a feeling this RFC does not take into account the fact that the underlying storage of overlay filesystems are not allowed in Linux :

Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.

See filesystems/overlayfs.html#changes-to-underlying-filesystems https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems

I think that to avoid confusing users of this RFC, this should be acknowledged, as it is a serious limitation of the feature (it means that any host sharing its store with guests would be immutable as long as a single guest would be running). Also, if this was not taken into account when comparing alternatives, it's possible that they need to be revisited. Namely, I think it makes the fuse-binding-on-demand alternative much more interesting.

It seems to me that overlayed stores have a very limited usefulness given this limitation, except for very specific cases like the NixOS live CD, where the base store is guaranteed not to change, and this use case seems properly handled with the existing tooling.

In rfcs/0000-local-overlay-store.md https://github.com/NixOS/rfcs/pull/152#discussion_r1377100881:

+#### Nodes + +- Right angle corners: Absent store object +- Rounded corners: Present store object + +#### Edges + +- Solid edge: "adding" or "deduplicating" direction +- Dotted edge: "deleting" direction + +The graph when reduced to just one type of edge is always acyclic, and thus represents a partial order. + +### Lower Store + +While in use as a lower store by one or more overlay stores, a store most only grow "monotonically".

In Linux, changes to the underlying storage of overlay devices has undefined behaviour https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems, so a store used as a lower store cannot grow at all.

— Reply to this email directly, view it on GitHub https://github.com/NixOS/rfcs/pull/152#pullrequestreview-1705519047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZNI36GQTZWDCKWMTUMG3YCCLKXAVCNFSM6AAAAAAZGNR4O2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMBVGUYTSMBUG4 . You are receiving this because you were mentioned.Message ID: @.***>

arianvp avatar Oct 31 '23 09:10 arianvp

Although there are techniques to trigger updates from the host, such as the described feature (which can already be simulated by just remounting a new overlay on top), it doesn't change the fact that the underlying device for the lower layer of an overlayed store has to be immutable. Maybe that will change, but I haven't seen talks of that, and from what I understand of Linux's VFS it would not be an easy change.

For example, in the LWN article, the "base image" is recreated anew, and its new version is propagated on top of the overlay that uses the old base image. I'm not sure how that would be practical with a Nix store, especially if there are multiple overlayed stores sharing some lower store.

ballit6782 avatar Oct 31 '23 09:10 ballit6782

There is some more context provided by a developer in this thread in the linux-fsdevel mailing list :

Specifically, renaming directories and files in lower that were already copied up is going to have a weird outcome.

It doesn't say anything about adding directories/files in lower that already exist in upper layers, though.

Also, someone else then adds this comment about allowing changes to the lower fs :

Best way to keep things simple is to only add functionality when someone actually needs it (and can test it). This has been the design policy in overlayfs and it worked wonderfully.

Maybe we can reach out to linux-fsdevel and describe our use case. If we only require "extending" the lower filesystem online, it's possible that it wouldn't require a lot of changes.

8aed avatar Nov 03 '23 14:11 8aed