nix
nix copied to clipboard
Impure derivations
This might reveal a deep misunderstanding on my part, but as far as I can tell, nix fundamentally divides its derivations into "fixed-output" and "deterministic build", based on the presence/absence of outputHash. I'm wondering if there could be a third type of fundamental building block which could allow limited but trackable nondeterministic behavior. The main example I can think of right now is the new fetchTarball builtin, which has its own magic caching strategy, but you could imagine wanting to pull the latest git revision of something using fetchgit and the like. If you use fetchgit as a fixed-output derivation, you can't always get the latest version. If you have it "lie" and pretend not to be a fixed-output derivation, nix will only ever do the work once and not bother refreshing itself.
If nix supported this third type of derivation, I could imagine something like:
{
fetchTarball = url: builtins.nondetDerivation {
builder = ./fetchtarball.sh; # contains the actual download logic
inherit url;
cachingStrategy = "hourly"; # Perhaps it could take frequency specifiers like this, which would tell nix to incorporate evaluation time into the store hash, or possibly a more flexible mechanism that I haven't yet thought of
};
}
Of course, it should be possible for you to take an expression and figure out all sources of nondeterminism in it (much like how this source downloader works) so as to better trust the evaluation.
Another possible feature of interest could be the notion of a nondetDerivation optionally (it's not possible with all sources of nondeterminism, but is obviously desirable) emitting some sort of an "anchor" allowing one to tie the nondeterministic evaluation down to something deterministic. Think how ruby's Gemfile ties itself down to Gemfile.lock (but we'd obviously provide hashes), and how when you fetch a git ref you can "lock it down" by resolving that ref to a hash. Another example is how the NixOS channel mechanism resolves the top-level redirect to a precise channel revision. Such an anchor file could then be maintained as a way to lock down nondeterminism to get reproducible system states, but you could also selectively (or in bulk) update the locked things (much like nix-channel --update) to get newer versions.
A last example is just how magic path references in nix copy things into the store for you. We could retain the built-in syntax, but translate the syntax into implicit invocations of the same nondetDerivation primitive.
Is this too weird? I'm just trying to think of a principled way to track my nondeterminism, and possibly to unify the channel world into pure nix.
TBC: I'm not proposing adding more nondeterminism to the system. Just want to be able to track/unify the existing stuff better.
cc @edolstra @shlevy
To make things even weirder, hydra could use this for its job specification with nondeterministic calls to fetchgit and fetchsvn.
Nobody have any comments? I can flesh out the idea more if it would help. I think it could be a pretty cool way to manage the (limited but often necessary) pieces of mutable state in a Nix-based system.
To sum up, these derivations would:
- need to have both impure access (mainly internet) and time-changing output, i.e. like fixed-output derivation without fixing the output hash;
- be useful primarily to fetch latest version of foo from repository;
- re-run their instantiation when they time-out (perhaps selectable as an attribute) or forced by some command-line flag.
Do I get this right?
Current status of code generators?
I'm certain there are already general tools that prefetch latest source and update hashes in *.nix files – currently I don't see a distinct advantage in having this built in. For example, @MarcWeber has these REGION AUTO UPDATE things IIRC, and there may be others. Putting the nondeterministic part into a separate tool seems easier to update exactly those things you want and let others locked down (shell-scripting your most common use cases).
I talked about some similar things in my somewhat-recent fetchgitLocal PR: https://github.com/NixOS/nixpkgs/pull/10176#issuecomment-146610542. I think the interplay between the two derivations (a trick that predates my PR to be fair) is like the "anchoring" you mention.
From these issues with my new fetchgitlocal, https://github.com/NixOS/nixpkgs/issues/10873 I am starting to think we need non-deterministic packages which run under the current user to generalize things putting like private directories in the store.
I'll probably see if I can drum up some interest about this (and flesh out my proposal) at NixCon in Berlin. @Ericson2314, will you be there?
That would be great! Unfortunately, school will keep me away from NixCon, but let me know how it goes.
I've been tinkering with this recently, and might be able to put up a PR for a hypothetical implementation (subject to lots of implementation and design feedback) in the next week or so, if I get some time.
Edit: turned out to be more complicated than expected :(
Tagging https://github.com/NixOS/nix/issues/904 for posterity.
@edolstra I'm considering working on this. Is there any chance I can get some assurance of a timely review and/or permission to merge myself before I put a large amount of work in?
I posted this in another ticket:
part of the reason I'm so interested in #520 is that I think that could be a cool model for channels as well as packages. The main properties I want out of a nondeterministic derivation are the ability to (somehow, programmatically) define how often I want it to update, and (most of the time) give myself a way of pinning to a particular version. Think of Ruby's Gemfile and Gemfile.lock distinction: Gemfile (on some level) defines an update policy (via bounds on package versions), and Gemfile.lock is an instantiation of that policy to exact versions that will be reproducible.
Think of what we want from channels:
I want to point to e.g., github.com/nixos/nixpkgs-channels/tree/nixpkgs-unstable (basically an update policy; I want to update at most as often as the branch updates) The branch can be resolved to an exact hash for later reproducibility I want to know explicitly that somewhere in my (otherwise highly deterministic) Nix evaluation, a possibly nondeterministic "moving target" is involved, and be given the opportunity to lock it down to something that point 2 produces I don't know of a great UI for this, but here's one not-so-great one that might inspire other ideas:
When you write a nondeterministic derivation, you generate a UUID and paste it into the expression source Any evaluation of that nondeterministic derivation will get added to a top-level list of sources of nondeterminism in your expression, indexed by the associated UUID, and it's very clear when you evaluate an expression that your nondeterminism is included (so like when the top-level list of things to build and things to download from cache is printed, it could include a third category for these) Any build of a nondeterministic derivation gets a sandbox that allows network access The interface could (at first at least) basically be one that gives you a little "shim" to decide what to feed into a fixed-output derivation. That is, nondeterministic derivation = deterministic FO derivation + "decide (and record) which version to download". That would accommodate many common cases of git hashes and the like. Nix maintains a central registry on your machine of current resolved UUIDs, and lets you request that a particular UUID be updated (this is the equivalent of nix-channel --update) Then this mechanism can be used for channels, Hydra sources (don't have to make VCS into a first-class notion in Hydra anymore), packages that have sensible update semantics, and so on.
I realize this is still pretty sketchy and probably doesn't belong in this ticket, but I do think something in this direction would be a killer feature, allowing us to unify the deterministic Nix world with changing surroundings in a relatively painless manner.
So Shea told me about fetchgit today and it seems rather upsetting. It seems convenient sometimes, but is there going to be a config option or CLI flag or something to turn determinism back on? When I run a build, how will I be able to tell whether it's a deterministic one or one with unpinned fetches?
Yeah, there's --pure as of a couple of day ago, I think. It should turn off all sources of impurity.
Internally at Target we expose fetchGit through an interface that enforces specifying either a revision or a tag (we map tags to tags/${tag} in the ref and they're only trusted for internal repos our team controls)
The motivation why fetchGit doesn't require a hash is that file system access doesn't require a hash either.
So evaluation was already impure at that level (you could edit a Nix expression and get a different result).
@edolstra does --pure affect filesystem access? (E.g. Only paths in already in store, etc.)
It seems like --pure maybe should also prevent accessing file paths that are outside of some designated root directory.
Why yes, my build does rely on /run/keys, why do you ask?
--pure disallows filesystem access (except possibly in store). #1816 would reallow it if you know the hash in advance.
Don't __impure derivations (https://github.com/NixOS/nix/commit/647291cd6c7559f68d49a5cdd907c2fd580790b1) resolve most of the issues here ?
For @copumpkin's grand idea (which I find super cool), we could allow channels to point to an impure nix derivation instead of an URL. Then we can reuse the channels mechanism, and in particular rollbacks, for impure derivations. And that would only need a relatively small change to nix.
Is there any hope of seeing __impure merged into the main branch any time soon?
@deliciouslytyped ca derivations make __impure a lot better, so we should wait for that.
ca derivations make
__impurea lot better, so we should wait for that.
And now we have them! (https://github.com/NixOS/nix/issues/4087) So let's resurrect this. Should be quite easy, actually.
Looking at https://github.com/edolstra/nix/commit/690e06b58e19020d69c9fe8bd2d06b45c14f65b5, hare are some notes:
-
We now have
DerivationTypewhich is specifically meant to make dealing with new sorts of derivations, like this, easier. The only hiccup is how to store the extra purity bool. I suppose I would be in favor of combiningDerivationandParsedDerivationif it helps. (That would mean enriching the in-memoryDerivationwhile continuing the same tricks to not mess with the drv file and nix expr representations.) -
Pure derivations actually can depend on impure derivations. We just need to be careful not to pollute any maps with anything that depends on the current impure drv -> output mapping. Incidentally https://github.com/NixOS/nix/pull/4056 faces similar issues (don't let prior resolutions leak to eval time) and surmounts them.
-
We can also do "pure fixed output derivations" for free. I think this is good. For example,
fetchpatchcan become two derivations:- fetch impurely without output hash.
- Normalize purely with output hash.
So let's just wait for https://github.com/NixOS/nix/pull/4056 to land, and then we basically "do it again" for this!
CC @regnat
I marked this as stale due to inactivity. → More info
still interested
I marked this as stale due to inactivity. → More info
Still interested
Still interested
Does https://github.com/NixOS/nix/pull/6227 resolve your use-case?