Make output paths/hashes indifferent to possible CA/FOD origin of sources
Is your feature request related to a problem? Please describe.
We can't currently swap out an inputSrcs derivation input for an inputDrvs input derivation input and get the same output path, despite no observable differences (with the possible exception of introspective features such as exportReferencesGraph, which we could force to be mutually exclusive with this improved hashing feature).
nix repl session illustrating how it currently works, and what this improvement entails
Currently we have some equivalence between FOD output paths and sources:
nix-repl> pkgs.emptyFile.outPath
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"
nix-repl> "${./empty-file}"
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"
However, as expected, the string contexts are different:
nix-repl> :p builtins.getContext pkgs.emptyFile.outPath
{ "/nix/store/jnghgcdkdc79p7qp4vfc27ryvnnm5igi-empty-file.drv" = { outputs = [ "out" ]; }; }
nix-repl> :p builtins.getContext "${./empty-file}"
{ "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" = { path = true; }; }
In itself, this is not a problem. Fact is that they need to be built differently. One involves an FOD. The other was added to the store during instantiation.
That being the case, we can expect dependents to have different derivation hashes, representing the different build graph.
nix-repl> pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]
«derivation /nix/store/nmzk7crah5himm4h3nbh6ry8331c99nq-derived.drv»
nix-repl> pkgs.concatText "derived" [ "${./empty-file}" ]
«derivation /nix/store/6bs6pkf3z4gfmjk0ssr7qfldrvyr8qfy-derived.drv»
So far so good, but now we observe a difference in output paths, which is not necessary.
nix-repl> "${pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]}"
"/nix/store/4q9z73ak24fl3qr4j5wwkjbw0l7ch0rh-derived"
nix-repl> "${pkgs.concatText "derived" [ "${./empty-file}" ]}"
"/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived"
By implementing this proposal (and enabling it in Nixpkgs), both output paths would have the value /nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived.
Describe the solution you'd like
Instantiation works the same, except for the hashing of certain input-addressed outputs.
Specifically, when a derivation attribute flag such as __derivationAgnosticOutputHashing is enabled, inputs from FODs are ignored for the purpose of hashing; instead interpreting their output paths as inputSrcs.
Ignore __derivationAgnosticOutputHashing when the build could perceive the difference, e.g. when it has flags that would let it introspect its dependencies. I don't think it could use exportReferencesGraph on itself, so this may not even be a problem. I just haven't checked whether that was everything that needs to be controlled. TBD.
Describe alternatives you've considered
Fail instead of ignoring __derivationAgnosticOutputHashing. In practice this would mean that mkDerivation has to be clever about when to set the flag, which seems rather convoluted, pointless, and ever so slightly slower.
Additional context
- This would be good to do before #9077
- Another output hashing change is #10780
Priorities
Add :+1: to issues you find important.
Yeah I've wanted this for a bit. Note that it can/should apply to all content-addressed outputs too; that is also non-fixed ones.
The logic should also apply to CA realisation (.doi) lookups, which are currently just drvHash based; not only input addressed derivation outputs seem to suffer from the discrepancy.
@roberth I've also been thinking about whether we can put the caching of this stuff in the derivation itself. Right now needing to recompute everything from the roots is annoying/slow. It would be cool if you could resume it from any derivation. (Of course, for verifying the store it would be good to still retain the ability to check it from scratch.)
https://github.com/NixOS/nix/blob/c8458bd731eb1c74159bebe459ea00165e056b65/src/libstore/daemon.cc#L583-L614 oh here is another angle I forgot.
If we make that inputSrcs vs inputDrvs never influences the output path then we get:
- Even more preservation of input addresses across rewrites which don't matter
- Sending input-addressed derivations to another machine no longer requires trust / no longer has attack vector of writing to arbitrary output paths. (However sending over input-addressed data is still risky, so this only helps with "shallow" input-addressing where the remote already has/trust any input that isn't content addressed.)
- Also addresses the caching issue above, because we only need to look up the output paths of our immediate deps.
The algorithm is very simple (works for the dynamic derivations definition too):
- Inputs are conceptually
Set DerivingPath - Whenever we have a
DerivingPath::Output { drvPath, outputName }we can statically/trustlessly evaluate to aDerivingPath::Constant, do so. (Anywhere nested, for dynamic derivations case) - Once no more such static/trustless rewrites are possible, calculate the output path on the resulting derivation.
Probably best to start with
- https://github.com/NixOS/nix/issues/9189
That will give us the language to describe the new hashing scheme more effectively.
#6877 wanted to get to that :)