nix Make output paths/hashes indifferent to possible CA/FOD origin of sources

Is your feature request related to a problem? Please describe.

We can't currently swap out an inputSrcs derivation input for an inputDrvs input derivation input and get the same output path, despite no observable differences (with the possible exception of introspective features such as exportReferencesGraph, which we could force to be mutually exclusive with this improved hashing feature).

nix repl session illustrating how it currently works, and what this improvement entails

Currently we have some equivalence between FOD output paths and sources:

nix-repl> pkgs.emptyFile.outPath
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

nix-repl> "${./empty-file}"
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

However, as expected, the string contexts are different:

nix-repl> :p builtins.getContext pkgs.emptyFile.outPath
{ "/nix/store/jnghgcdkdc79p7qp4vfc27ryvnnm5igi-empty-file.drv" = { outputs = [ "out" ]; }; }

nix-repl> :p builtins.getContext "${./empty-file}"                       
{ "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" = { path = true; }; }

In itself, this is not a problem. Fact is that they need to be built differently. One involves an FOD. The other was added to the store during instantiation.

That being the case, we can expect dependents to have different derivation hashes, representing the different build graph.

nix-repl> pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]
«derivation /nix/store/nmzk7crah5himm4h3nbh6ry8331c99nq-derived.drv»

nix-repl> pkgs.concatText "derived" [ "${./empty-file}" ]      
«derivation /nix/store/6bs6pkf3z4gfmjk0ssr7qfldrvyr8qfy-derived.drv»

So far so good, but now we observe a difference in output paths, which is not necessary.

nix-repl> "${pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]}"   
"/nix/store/4q9z73ak24fl3qr4j5wwkjbw0l7ch0rh-derived"

nix-repl> "${pkgs.concatText "derived" [ "${./empty-file}" ]}"      
"/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived"

By implementing this proposal (and enabling it in Nixpkgs), both output paths would have the value /nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived.

Describe the solution you'd like

Instantiation works the same, except for the hashing of certain input-addressed outputs.

Specifically, when a derivation attribute flag such as __derivationAgnosticOutputHashing is enabled, inputs from FODs are ignored for the purpose of hashing; instead interpreting their output paths as inputSrcs.

Ignore __derivationAgnosticOutputHashing when the build could perceive the difference, e.g. when it has flags that would let it introspect its dependencies. I don't think it could use exportReferencesGraph on itself, so this may not even be a problem. I just haven't checked whether that was everything that needs to be controlled. TBD.

Describe alternatives you've considered

Fail instead of ignoring __derivationAgnosticOutputHashing. In practice this would mean that mkDerivation has to be clever about when to set the flag, which seems rather convoluted, pointless, and ever so slightly slower.

Additional context

This would be good to do before #9077
Another output hashing change is #10780

Priorities

Add :+1: to issues you find important.

Oct 31 '23 10:10 roberth

Yeah I've wanted this for a bit. Note that it can/should apply to all content-addressed outputs too; that is also non-fixed ones.

Nov 01 '23 19:11 Ericson2314

The logic should also apply to CA realisation (.doi) lookups, which are currently just drvHash based; not only input addressed derivation outputs seem to suffer from the discrepancy.

Dec 07 '23 19:12 roberth

@roberth I've also been thinking about whether we can put the caching of this stuff in the derivation itself. Right now needing to recompute everything from the roots is annoying/slow. It would be cool if you could resume it from any derivation. (Of course, for verifying the store it would be good to still retain the ability to check it from scratch.)

Dec 08 '23 04:12 Ericson2314

https://github.com/NixOS/nix/blob/c8458bd731eb1c74159bebe459ea00165e056b65/src/libstore/daemon.cc#L583-L614 oh here is another angle I forgot.

If we make that inputSrcs vs inputDrvs never influences the output path then we get:

Even more preservation of input addresses across rewrites which don't matter
Sending input-addressed derivations to another machine no longer requires trust / no longer has attack vector of writing to arbitrary output paths. (However sending over input-addressed data is still risky, so this only helps with "shallow" input-addressing where the remote already has/trust any input that isn't content addressed.)
Also addresses the caching issue above, because we only need to look up the output paths of our immediate deps.

The algorithm is very simple (works for the dynamic derivations definition too):

Inputs are conceptually Set DerivingPath
Whenever we have a DerivingPath::Output { drvPath, outputName } we can statically/trustlessly evaluate to a DerivingPath::Constant, do so. (Anywhere nested, for dynamic derivations case)
Once no more such static/trustless rewrites are possible, calculate the output path on the resulting derivation.

Dec 10 '23 16:12 Ericson2314

Probably best to start with

https://github.com/NixOS/nix/issues/9189

That will give us the language to describe the new hashing scheme more effectively.

Dec 13 '23 18:12 roberth

#6877 wanted to get to that :)

Dec 13 '23 18:12 Ericson2314