rfcs
rfcs copied to clipboard
[RFC 0152] `local-overlay` store
Add a new local-overlay
store implementation to Nix. This will be a local store that is layered upon another local filesystem store (local store or daemon). This allows locally extending a shared store that is periodically updated with additional store objects.
This work is sponsored by Replit :sparkles:
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/super-colliding-nix-stores/28462/17
What if, instead of requiring that the lower store grow monotonically, the overlay store maintained GC roots in the lower for any lower paths it references? I haven't thought this through to the level of detail in the spec, but it strikes me as an alternative worth considering; not being able to run garbage collection on the lower store would certainly be a barrier for personal use (perhaps not a central goal for this RFC) and possibly for org use as well—storage is cheap but it doesn't round down to free in every context.
I work for Replit (who sponsored work on this RFC) and helped out with the creation of this RFC. I am happy to serve as a Shepherd on this RFC, but also happy to cheer from the sidelines if people (or the Steering Committee) see this as a conflict of interest.
I'm generally excited about this. I have a slightly different target than Replit has. I'm shipping firmware image (Multiple GB -large firmware images) with a nix/store that is immutable and signed (via dm-verity and secureboot). The lower overlay ships with a document that provides a way to rebuild the db with everything that was put in the store, at runtime the db is rebuilt. Configuration is then evaluated and a new system is built in the upper store.
This would solve the initial import of the DB which is still not that fast (although probably way faster than importing 16TB worth of derivations!). For our use-case, I still believe we'll trash the upper layer at each boot because it's easier to reason about.
@rhendric That is useful for some things, but probably not the use-case of large numbers of consumers all sharing the same underlying store --- it is pretty important the underlying store be truely read only in that case, including any GC roots.
@baloo We have separately thought about those sorts of issues, including a persistent upper store that then "pivots" onto a new lower store when one does can upgrade of NixOS (and I suppose GC of the old generations). I think the pivoting feature is a nice future work item.
This RFC is now open for shepherd nominations!
I nominate @roberth
That is useful for some things, but probably not the use-case of large numbers of consumers all sharing the same underlying store
Now this makes me wonder if local-overlay
sounds a bit more local than this use-case (from a naming point of view)…
I suppose would deprecate all "local" and "remote" as not being misleading names. I just picked local-overlay
on moments notice to match the existing pattern.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-51/30870/1
This RFC has not acquired enough shepherds. This typically shows lack of interest from the community. In order to progress a full shepherd team is required. Consider trying to raise interest by posting in Discourse, talking in Matrix or reaching out to people that you know.
If not enough shepherds can be found in the next month we will close this RFC until we can find enough interested participants. The PR can be reopened at any time if more shepherd nominations are made.
@baloo, @rickynils, @zhaofengli, @arianvp, @edolstra would any of you be open to shepherding this along?
I don't see much controversy or drama here and, in my opinion, the document is in good shape too, so overall I expect the shepherd work to be low-commitment.
If you've never done it before, you can look at https://github.com/NixOS/rfcs/blob/master/rfcs/0036-rfc-process-team-amendment.md#shepherd-team for more information about being a Shepherd.
I can Shepard
I also poked the #nixos-systemd channel. As some people are looking at Appliance images / immutable nix-store
partitions and it seems to have a lot overlap with this RFC
I am auto-nominating myself as a shepherd, to be completely clear, my availability for August is limited as I will be on vacations, but I will have time back on September, though I may have other responsibilities like Release Manager that have a higher priority, if you will have me, that's fine. :)
Self-nominate
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nixos-with-shared-nix-store-among-compute-nodes/33235/2
Initial meeting took place on 2023-09-24
Notes
[ericson]: RyanTM discussed review+merge of PR in current form as experimental feature
[RaitoBezarius]: go with FUSE proposal - [ericson]: interaction with ACL proposal? - https://cl.tvl.fyi/c/depot/+/9449/4/tvix/tests/README.md
[ericson]: delete has similar problem as whiteout/tombstones, I do agree FUSE is probably necessary to "truly" solve that.
[RaitoBezarius]: sqlite is not longer applicable
[arian]: also store nar+narinfo?
- [ericson]: can layer with sqlite, would need initial load to build indexes
[RaitoBezarius]: would like DB to be impl detail
- NixStoreFS
- [ericson]: allow building with LocalFSStore
(abstract superclass), then rename to LocalSqliteStore
- [raito]: blobStore, directoryStore, metadata, etc?
- too much separation of processes?
- current store has issues when used as a build store, leaking FDs
- dmesg
for the win?!
- WIP to make everything in builds a bind mount, not hardlinks
[arian]: concern about what is used as the lowest layer in a standard setup. possibility to use the narinfo layer [RaitoBezarius]: deduplication in store layer? [ericson]: this is a very stateful feature, requires migration? [RaitoBezarius]: proper migration provides much more peace of mind, expectation to have store->store migration.
[ericson]: would be good to have Postgres store - s3fs, goofys? - bcachefs - potential? - TODO: get contacts from RaitoBezarius about kernel developments
[tomberek]: most of this is to help decide if current proposal is worth working on. - [arian]: local-overlay-store is somewhat of a stopgap, should not be blocked by potential of future work - code and implementation exists
[RaitoBezarius]: propose merge as experimental, get feedback - more ambitious: migration stories, encourage store layer experimentation, another RFC - proliferation of store layers that are not "friendly" wrt migrations, version, etc. - can accept this, but need to be careful - this will be used by larger organizations, thus cementing the xp features - [tomberek]: interaction with scheduling of remote builds? removes deadlock?
Experimental?
Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path. Afterward people will likely complain. We must set the expectation that this is not a stable feature.
- TODO: discuss with ReplIt, ask community for who is interested in this as a production feature. Inivited ryantm to join and answer a few questions.
- [ryantm]: we plan to use the PR as-is
- [ryantm]: have been on 2.3, planning to jump forward to use this PR and pinning if necessary
- [ryantm]: will need to support better infra in the future and support by Nix team
- [ryantm]: local overlay store is a nice way to integrate with other underlying store implementation (tvix-store for example)
Summary
- Merging of code is agreed upon
- Decision: merge PR as xp, get feedback, RFC discussion continues, stabalize + accept
- TODO: get "hands-dirty" [arian,tomberek]
- TODO: need feedback from experimental usage
- expectation: possible to merge within 2 months
- reconvene after experience with merged xp feature
Misc Links:
- https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
- https://github.com/project-machine/puzzlefs
- FUSE passthrough
Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.
Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in nix.conf
:
experimental-features = local-overlay-2.15
Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.
Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.
Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in
nix.conf
:experimental-features = local-overlay-2.15
Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.
I think it still doesn't address migration paths for multiple store backends, alas.
I think it still doesn't address migration paths for multiple store backends, alas.
Not quite sure I understand the probIem statement, but 'm not sure migration of data from one layer to another needs to be part of this initial RFC. Nix is not doing the overlayfs mounts on its own either, but relies on the user of the feature to take care of this, and know the surrounding context - so it seems prudent to rely on that same user with the context to also migrate/move data around as it makes sense for their context.
Team is concerned that such an experimental feature will be used and quickly become a non-experiment and relied upon, making breaking changes more difficult, requiring migration path.
Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in
nix.conf
:experimental-features = local-overlay-2.15
Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.
This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.
I think it still doesn't address migration paths for multiple store backends, alas.
I'm also not entirely sure if I get it, but it could be required that experimental features cannot share their Nix store with stable Nix versions, so e.g. it would have to be /nix/experimental/store
or so. And only provide non-destructive ways to migrate stores while it's still experimental, so e.g. "copy + migrate the store" but no "migrate the store in place".
This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.
Mostly yeah. I think it's appropriate to bring up since it was mentioned as a concern in the meeting notes, but it might not have to be a concern with this idea.
Here's a general idea: Make experimental feature flags be versioned to the current Nix version. E.g. in order to use a local overlay store with Nix 2.15.x you need this in
nix.conf
:experimental-features = local-overlay-2.15
Only features that match the current Nix version get enabled, do not provide a way to enable it for future Nix versions.
This is how you end up with people advising others to do
nix.settings.experimental-features = [ "local-overlay-${lib.versions.majorMinor config.nix.package.version}" ];
It's not that people don't know experimental features might change, it's that they don't care until it happens, or don't believe it really will.
I think it still doesn't address migration paths for multiple store backends, alas.
I'm also not entirely sure if I get it, but it could be required that experimental features cannot share their Nix store with stable Nix versions, so e.g. it would have to be
/nix/experimental/store
or so. And only provide non-destructive ways to migrate stores while it's still experimental, so e.g. "copy + migrate the store" but no "migrate the store in place".This seems to be orthogonal to this RFC, I'm not sure it should be discussed here.
Mostly yeah. I think it's appropriate to bring up since it was mentioned as a concern in the meeting notes, but it might not have to be a concern with this idea.
I think it still doesn't address migration paths for multiple store backends, alas.
Not quite sure I understand the probIem statement, but 'm not sure migration of data from one layer to another needs to be part of this initial RFC. Nix is not doing the overlayfs mounts on its own either, but relies on the user of the feature to take care of this, and know the surrounding context - so it seems prudent to rely on that same user with the context to also migrate/move data around as it makes sense for their context.
My point was about that we don't have the right primitives to ingest store paths from a backend to move away from this backend, except serializing the store and deserializing it, which cannot always work in certain scenarios of large stores.
I am okay with experimental features not having migration knobs, I am meh (or even against) about multiplying the amount of store backends and having no migration stories for them, which will hamper our ability to make them evolve.
There are a lot of things in Nix that just perform automatic migration, I think it's important we take the time to formalize this, I am not saying we need to have a good answer as part of this RFC, but someone has to care about this.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/building-qemu-kvm-vms/33149/20
There's a new method that landed in the kernel that allows you to "tuck" the lower store. Does that address these things? https://lwn.net/Articles/927491/
On Tue, Oct 31, 2023, 06:31 ballit6782 @.***> wrote:
@.**** requested changes on this pull request.
I have a feeling this RFC does not take into account the fact that the underlying storage of overlay filesystems are not allowed in Linux :
Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.
See filesystems/overlayfs.html#changes-to-underlying-filesystems https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems
I think that to avoid confusing users of this RFC, this should be acknowledged, as it is a serious limitation of the feature (it means that any host sharing its store with guests would be immutable as long as a single guest would be running). Also, if this was not taken into account when comparing alternatives, it's possible that they need to be revisited. Namely, I think it makes the fuse-binding-on-demand alternative much more interesting.
It seems to me that overlayed stores have a very limited usefulness given this limitation, except for very specific cases like the NixOS live CD, where the base store is guaranteed not to change, and this use case seems properly handled with the existing tooling.
In rfcs/0000-local-overlay-store.md https://github.com/NixOS/rfcs/pull/152#discussion_r1377100881:
+#### Nodes + +- Right angle corners: Absent store object +- Rounded corners: Present store object + +#### Edges + +- Solid edge: "adding" or "deduplicating" direction +- Dotted edge: "deleting" direction + +The graph when reduced to just one type of edge is always acyclic, and thus represents a partial order. + +### Lower Store + +While in use as a lower store by one or more overlay stores, a store most only grow "monotonically".
In Linux, changes to the underlying storage of overlay devices has undefined behaviour https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems, so a store used as a lower store cannot grow at all.
— Reply to this email directly, view it on GitHub https://github.com/NixOS/rfcs/pull/152#pullrequestreview-1705519047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZNI36GQTZWDCKWMTUMG3YCCLKXAVCNFSM6AAAAAAZGNR4O2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMBVGUYTSMBUG4 . You are receiving this because you were mentioned.Message ID: @.***>
Although there are techniques to trigger updates from the host, such as the described feature (which can already be simulated by just remounting a new overlay on top), it doesn't change the fact that the underlying device for the lower layer of an overlayed store has to be immutable. Maybe that will change, but I haven't seen talks of that, and from what I understand of Linux's VFS it would not be an easy change.
For example, in the LWN article, the "base image" is recreated anew, and its new version is propagated on top of the overlay that uses the old base image. I'm not sure how that would be practical with a Nix store, especially if there are multiple overlayed stores sharing some lower store.
There is some more context provided by a developer in this thread in the linux-fsdevel mailing list :
Specifically, renaming directories and files in lower that were already copied up is going to have a weird outcome.
It doesn't say anything about adding directories/files in lower that already exist in upper layers, though.
Also, someone else then adds this comment about allowing changes to the lower fs :
Best way to keep things simple is to only add functionality when someone actually needs it (and can test it). This has been the design policy in overlayfs and it worked wonderfully.
Maybe we can reach out to linux-fsdevel and describe our use case. If we only require "extending" the lower filesystem online, it's possible that it wouldn't require a lot of changes.