nix icon indicating copy to clipboard operation
nix copied to clipboard

Impure derivations

Open copumpkin opened this issue 10 years ago • 36 comments

This might reveal a deep misunderstanding on my part, but as far as I can tell, nix fundamentally divides its derivations into "fixed-output" and "deterministic build", based on the presence/absence of outputHash. I'm wondering if there could be a third type of fundamental building block which could allow limited but trackable nondeterministic behavior. The main example I can think of right now is the new fetchTarball builtin, which has its own magic caching strategy, but you could imagine wanting to pull the latest git revision of something using fetchgit and the like. If you use fetchgit as a fixed-output derivation, you can't always get the latest version. If you have it "lie" and pretend not to be a fixed-output derivation, nix will only ever do the work once and not bother refreshing itself.

If nix supported this third type of derivation, I could imagine something like:

{
  fetchTarball = url: builtins.nondetDerivation {
    builder = ./fetchtarball.sh; # contains the actual download logic
    inherit url;
    cachingStrategy = "hourly"; # Perhaps it could take frequency specifiers like this, which would tell nix to incorporate evaluation time into the store hash, or possibly a more flexible mechanism that I haven't yet thought of
  };
}

Of course, it should be possible for you to take an expression and figure out all sources of nondeterminism in it (much like how this source downloader works) so as to better trust the evaluation.

Another possible feature of interest could be the notion of a nondetDerivation optionally (it's not possible with all sources of nondeterminism, but is obviously desirable) emitting some sort of an "anchor" allowing one to tie the nondeterministic evaluation down to something deterministic. Think how ruby's Gemfile ties itself down to Gemfile.lock (but we'd obviously provide hashes), and how when you fetch a git ref you can "lock it down" by resolving that ref to a hash. Another example is how the NixOS channel mechanism resolves the top-level redirect to a precise channel revision. Such an anchor file could then be maintained as a way to lock down nondeterminism to get reproducible system states, but you could also selectively (or in bulk) update the locked things (much like nix-channel --update) to get newer versions.

A last example is just how magic path references in nix copy things into the store for you. We could retain the built-in syntax, but translate the syntax into implicit invocations of the same nondetDerivation primitive.

Is this too weird? I'm just trying to think of a principled way to track my nondeterminism, and possibly to unify the channel world into pure nix.

TBC: I'm not proposing adding more nondeterminism to the system. Just want to be able to track/unify the existing stuff better.

copumpkin avatar Apr 21 '15 20:04 copumpkin

cc @edolstra @shlevy

copumpkin avatar Apr 21 '15 20:04 copumpkin

To make things even weirder, hydra could use this for its job specification with nondeterministic calls to fetchgit and fetchsvn.

copumpkin avatar Apr 21 '15 22:04 copumpkin

Nobody have any comments? I can flesh out the idea more if it would help. I think it could be a pretty cool way to manage the (limited but often necessary) pieces of mutable state in a Nix-based system.

copumpkin avatar Apr 30 '15 00:04 copumpkin

To sum up, these derivations would:

  • need to have both impure access (mainly internet) and time-changing output, i.e. like fixed-output derivation without fixing the output hash;
  • be useful primarily to fetch latest version of foo from repository;
  • re-run their instantiation when they time-out (perhaps selectable as an attribute) or forced by some command-line flag.

Do I get this right?

Current status of code generators?

I'm certain there are already general tools that prefetch latest source and update hashes in *.nix files – currently I don't see a distinct advantage in having this built in. For example, @MarcWeber has these REGION AUTO UPDATE things IIRC, and there may be others. Putting the nondeterministic part into a separate tool seems easier to update exactly those things you want and let others locked down (shell-scripting your most common use cases).

vcunat avatar May 01 '15 07:05 vcunat

I talked about some similar things in my somewhat-recent fetchgitLocal PR: https://github.com/NixOS/nixpkgs/pull/10176#issuecomment-146610542. I think the interplay between the two derivations (a trick that predates my PR to be fair) is like the "anchoring" you mention.

Ericson2314 avatar Oct 27 '15 04:10 Ericson2314

From these issues with my new fetchgitlocal, https://github.com/NixOS/nixpkgs/issues/10873 I am starting to think we need non-deterministic packages which run under the current user to generalize things putting like private directories in the store.

Ericson2314 avatar Nov 08 '15 23:11 Ericson2314

I'll probably see if I can drum up some interest about this (and flesh out my proposal) at NixCon in Berlin. @Ericson2314, will you be there?

copumpkin avatar Nov 10 '15 15:11 copumpkin

That would be great! Unfortunately, school will keep me away from NixCon, but let me know how it goes.

Ericson2314 avatar Nov 10 '15 22:11 Ericson2314

I've been tinkering with this recently, and might be able to put up a PR for a hypothetical implementation (subject to lots of implementation and design feedback) in the next week or so, if I get some time.

Edit: turned out to be more complicated than expected :(

copumpkin avatar Apr 04 '16 13:04 copumpkin

Tagging https://github.com/NixOS/nix/issues/904 for posterity.

copumpkin avatar May 13 '16 18:05 copumpkin

@edolstra I'm considering working on this. Is there any chance I can get some assurance of a timely review and/or permission to merge myself before I put a large amount of work in?

shlevy avatar Jan 12 '17 20:01 shlevy

I posted this in another ticket:

part of the reason I'm so interested in #520 is that I think that could be a cool model for channels as well as packages. The main properties I want out of a nondeterministic derivation are the ability to (somehow, programmatically) define how often I want it to update, and (most of the time) give myself a way of pinning to a particular version. Think of Ruby's Gemfile and Gemfile.lock distinction: Gemfile (on some level) defines an update policy (via bounds on package versions), and Gemfile.lock is an instantiation of that policy to exact versions that will be reproducible.

Think of what we want from channels:

I want to point to e.g., github.com/nixos/nixpkgs-channels/tree/nixpkgs-unstable (basically an update policy; I want to update at most as often as the branch updates) The branch can be resolved to an exact hash for later reproducibility I want to know explicitly that somewhere in my (otherwise highly deterministic) Nix evaluation, a possibly nondeterministic "moving target" is involved, and be given the opportunity to lock it down to something that point 2 produces I don't know of a great UI for this, but here's one not-so-great one that might inspire other ideas:

When you write a nondeterministic derivation, you generate a UUID and paste it into the expression source Any evaluation of that nondeterministic derivation will get added to a top-level list of sources of nondeterminism in your expression, indexed by the associated UUID, and it's very clear when you evaluate an expression that your nondeterminism is included (so like when the top-level list of things to build and things to download from cache is printed, it could include a third category for these) Any build of a nondeterministic derivation gets a sandbox that allows network access The interface could (at first at least) basically be one that gives you a little "shim" to decide what to feed into a fixed-output derivation. That is, nondeterministic derivation = deterministic FO derivation + "decide (and record) which version to download". That would accommodate many common cases of git hashes and the like. Nix maintains a central registry on your machine of current resolved UUIDs, and lets you request that a particular UUID be updated (this is the equivalent of nix-channel --update) Then this mechanism can be used for channels, Hydra sources (don't have to make VCS into a first-class notion in Hydra anymore), packages that have sensible update semantics, and so on.

I realize this is still pretty sketchy and probably doesn't belong in this ticket, but I do think something in this direction would be a killer feature, allowing us to unify the deterministic Nix world with changing surroundings in a relatively painless manner.

copumpkin avatar Jan 27 '17 13:01 copumpkin

So Shea told me about fetchgit today and it seems rather upsetting. It seems convenient sometimes, but is there going to be a config option or CLI flag or something to turn determinism back on? When I run a build, how will I be able to tell whether it's a deterministic one or one with unpinned fetches?

chris-martin avatar Jan 25 '18 02:01 chris-martin

Yeah, there's --pure as of a couple of day ago, I think. It should turn off all sources of impurity.

copumpkin avatar Jan 25 '18 02:01 copumpkin

Internally at Target we expose fetchGit through an interface that enforces specifying either a revision or a tag (we map tags to tags/${tag} in the ref and they're only trusted for internal repos our team controls)

shlevy avatar Jan 25 '18 05:01 shlevy

The motivation why fetchGit doesn't require a hash is that file system access doesn't require a hash either. So evaluation was already impure at that level (you could edit a Nix expression and get a different result).

edolstra avatar Jan 26 '18 16:01 edolstra

@edolstra does --pure affect filesystem access? (E.g. Only paths in already in store, etc.)

Ericson2314 avatar Jan 26 '18 16:01 Ericson2314

It seems like --pure maybe should also prevent accessing file paths that are outside of some designated root directory.

chris-martin avatar Jan 26 '18 18:01 chris-martin

Why yes, my build does rely on /run/keys, why do you ask?

copumpkin avatar Jan 26 '18 18:01 copumpkin

--pure disallows filesystem access (except possibly in store). #1816 would reallow it if you know the hash in advance.

shlevy avatar Jan 26 '18 19:01 shlevy

Don't __impure derivations (https://github.com/NixOS/nix/commit/647291cd6c7559f68d49a5cdd907c2fd580790b1) resolve most of the issues here ? For @copumpkin's grand idea (which I find super cool), we could allow channels to point to an impure nix derivation instead of an URL. Then we can reuse the channels mechanism, and in particular rollbacks, for impure derivations. And that would only need a relatively small change to nix.

Nadrieril avatar Aug 30 '18 15:08 Nadrieril

Is there any hope of seeing __impure merged into the main branch any time soon?

deliciouslytyped avatar May 23 '19 17:05 deliciouslytyped

@deliciouslytyped ca derivations make __impure a lot better, so we should wait for that.

Ericson2314 avatar Mar 14 '20 16:03 Ericson2314

ca derivations make __impure a lot better, so we should wait for that.

And now we have them! (https://github.com/NixOS/nix/issues/4087) So let's resurrect this. Should be quite easy, actually.

Ericson2314 avatar Sep 29 '20 21:09 Ericson2314

Looking at https://github.com/edolstra/nix/commit/690e06b58e19020d69c9fe8bd2d06b45c14f65b5, hare are some notes:

  • We now have DerivationType which is specifically meant to make dealing with new sorts of derivations, like this, easier. The only hiccup is how to store the extra purity bool. I suppose I would be in favor of combining Derivation and ParsedDerivation if it helps. (That would mean enriching the in-memory Derivation while continuing the same tricks to not mess with the drv file and nix expr representations.)

  • Pure derivations actually can depend on impure derivations. We just need to be careful not to pollute any maps with anything that depends on the current impure drv -> output mapping. Incidentally https://github.com/NixOS/nix/pull/4056 faces similar issues (don't let prior resolutions leak to eval time) and surmounts them.

  • We can also do "pure fixed output derivations" for free. I think this is good. For example, fetchpatch can become two derivations:

    1. fetch impurely without output hash.
    2. Normalize purely with output hash.

So let's just wait for https://github.com/NixOS/nix/pull/4056 to land, and then we basically "do it again" for this!

CC @regnat

Ericson2314 avatar Sep 29 '20 22:09 Ericson2314

I marked this as stale due to inactivity. → More info

stale[bot] avatar Mar 31 '21 05:03 stale[bot]

still interested

tomberek avatar Aug 19 '21 18:08 tomberek

I marked this as stale due to inactivity. → More info

stale[bot] avatar Apr 17 '22 11:04 stale[bot]

Still interested

MagicRB avatar May 12 '22 14:05 MagicRB

Still interested

Does https://github.com/NixOS/nix/pull/6227 resolve your use-case?

tomberek avatar May 12 '22 14:05 tomberek