nix icon indicating copy to clipboard operation
nix copied to clipboard

Input::fetchToStore(): Don't try to substitute

Open edolstra opened this issue 1 year ago • 9 comments

Motivation

Having a narHash doesn't mean that we have the other attributes returned by the fetcher (such as lastModified or rev). For instance,

$ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f
Last modified: 2024-01-15 10:51:22

but

$ nix flake metadata github:NixOS/patchelf/7c2f768bf9601268a4e71c2ebe91e2011918a70f?narHash=sha256-PPXqKY2hJng4DBVE0I4xshv/vGLUskL7jl53roB8UdU%3D
(does not print a "Last modified")

The latter only happens if the store path already exists or is substitutable, which made this impure behaviour unpredictable.

Fixes #10601. Note: lazy-trees already had this fix, because it no longer requires all locked nodes to have a narHash.

Context

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

edolstra avatar Apr 26 '24 14:04 edolstra

(thoughts)

I'd still be concerned about similar bugs, because the code isn't structured in a way that would make such a bug somewhat harder to write. We have the requirement that

  • Given any input to fetchTree, it must only succeed with one possible return value - ie referential transparency in the happy path.

This crucial property is hard to test, so we should try to structure the code in a way that makes this property easy to determine (I won't say prove; too absolute to be realistic). A mutable map makes that hard.

In my previous attempt to improve this, I focused on giving types to the input side, which wasn't all that fruitful, given the amount of change and effort required. Maybe it'd be more effective to put the output into a struct before returning it as attributes?

roberth avatar Apr 26 '24 18:04 roberth

Is it necessary not to substitute? What if the the missing fields can be obtained from the lock file instead? I suppose we'd have to tell fetchTree not to return any new attributes except outPath. IIRC we don't have such a flag yet?

roberth avatar Apr 26 '24 18:04 roberth

Substitution could still be done for locked flake inputs, because there we know that we have all the attributes.

We do have the isLocked() method, but we can't use it for the tarball fetcher, since the HTTP headers can return additional lock attributes like lastModified and rev.

edolstra avatar Apr 26 '24 20:04 edolstra

but we can't use it for the tarball fetcher, since the HTTP headers can return additional lock attributes like lastModified and rev.

Aren't those saved to the lock then?

roberth avatar Apr 26 '24 21:04 roberth

They are, this problem only occurs on the CLI if you pass narHash but not the other attributes.

edolstra avatar Apr 26 '24 21:04 edolstra

I guess part of the issue is the removal of hasAllInfo(), though even before that the tarball fetcher had this issue (since its hasAllInfo() just returned true).

Regardless, correctness is more important than performance, and the substitution shortcut makes it harder to reason about the correctness of fetchers, so it's probably best to get rid of it for now.

edolstra avatar Apr 26 '24 21:04 edolstra

Fair enough. Some optimizations can be done later, unless we move towards a non-optimizable design. I don't think that's the case here.

EDIT: Some optimizations

roberth avatar Apr 26 '24 21:04 roberth

I would like to keep substituting, because I would like to be able to substitute from things like Software Heritage by being able to look up content addresses too and not just store paths.

Ericson2314 avatar May 17 '24 23:05 Ericson2314

Right, this isn't actually just an optimization. Another use case is where a (private) cache is used for requireFile-like functionality.

If we move the metadata into a separate attrset, we can both fix the problem and keep supporting these use cases. This can be done with a derivation/strictDerivation-like arrangement, where fetchTree is a builtin expression that calls the primops lazily.

roberth avatar May 18 '24 10:05 roberth

Since this is an actual correctness bug (the evaluation result can change depending on whether the flake store path exists in a substituter/local store), which is more important than an optimization, let's merge this. We can figure out how to bring back substitution in the future, if anybody cares. (However, substitution requires having a narHash for locks, which is somewhat incompatible with lazy trees.)

I did have to disable a test that uses relative path flakes because they're completely broken, but this was masked by the fetchToStore() behaviour (see 12fd65d1799f6b1fe0c7e07d5a8022afc6b6dc40). This will be fixed in #10089.

edolstra avatar Sep 11 '24 20:09 edolstra

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-are-flake-dependencies-fetched/52638/9

nixos-discourse avatar Sep 25 '24 15:09 nixos-discourse

We can figure out how to bring back substitution in the future, if anybody cares.

Have I understood correctly that this means flake inputs will no longer substitute from the cache? If so...

I care! I am tracking things in a substituter which cease to exist at their original urls and require the cache, and so users of my flakes will not be able to build them anymore with this change.

It also sounded to me from the above thread that both @Ericson2314 and @roberth care too:

I would like to keep substituting, because I would like to be able to substitute from things like Software Heritage by being able to look up content addresses too and not just store paths.

Right, this isn't actually just an optimization. Another use case is where a (private) cache is used for requireFile-like functionality.

pwaller avatar Sep 26 '24 17:09 pwaller

I also have a similar usecase, where I use a cache server as an insurance that I can rebuild potentially really old projects. I would like very much that this solution can work transparently for both derivations and flake inputs.

minijackson avatar Sep 27 '24 08:09 minijackson

I think I also need flake inputs to be able to continue to collect source code from a binary cache.

My understanding is that during flake evaluation, Flake inputs are collected and evaluated, in order to work out the hashes to download binaries from the cache.

Those flake inputs are often libraries, and are coming from github etc., e.g. github:hercules-ci/gitignore.nix or github:nix-community/gomod2nix so they're needed to evaluate the flake outputs.

In an offline environment (e.g. airgapped, or on a train running through the countryside), flake inputs can't simply be re-collected from the public Internet.

Today, in my offline environments, I have a binary cache that I populate with an export of everything required to build some software products. I'm then able to change the source code and rebuild locally using Nix, without connecting to the Internet.

If I understand this change, and the behaviour of Nix flake evaluation, it sounds like this might break that behaviour?

a-h avatar Sep 27 '24 08:09 a-h

Not super happy about this merge, and apparently I'm not the only one. Substitution is useful, and this bugfix is a regression for us.

I think we could fix the fix by adding back substitution as a fallback only (considering the performance argument mentioned elsewhere). The extra attributes would either have to be read from the lock file (and trusted, which is ok), or they should result in a lazy error that only triggers on those attributes. We already support multiple return types for attributes; may as well add a variant for an evaluation error (translated to mkApp(throw, msg)).

Note: lazy-trees already had this fix, because it no longer requires all locked nodes to have a narHash.

We don't need to remove narHash in order to copy sources to the store lazily. We can remove it as a side effect of trusting libfetchers to produce the right thing. Removing narHash saves us from having to compute it during locking, but I don't think this is significant, except in the case of building a dirty Git workdir, in which case we could just choose not to compute it. The narHash computation could be a step before writing out the lock file, and if we don't expose this hash to the language, its omission from dirty Git inputs is not observable to the evaluation of this temporary source.

@a-h

My understanding is that during flake evaluation, Flake inputs are collected and evaluated, in order to work out the hashes to download binaries from the cache.

This is how it happens during locking. Once locked, inputs are loaded on demand only using builtins.fetchTree (called internally, lazily).

roberth avatar Sep 27 '24 09:09 roberth

This is how it happens during locking. Once locked, inputs are loaded on demand only using builtins.fetchTree (called internally, lazily).

(Digression) Except if you refer to the outPath of the flake input, in which case https://github.com/NixOS/nix/issues/9570 happens; which means a flake input doesn't quite behave the same as a src for a derivation with respect to cache hits. AFAIU, The current behaviour source gets fetched during the computation of a derivation using the outPath, either way, when ideally it would only be fetched if you actually needed to use files within. [unless this has been fixed by lazy trees or something? I'm unsure if I am uptodate on this]

pwaller avatar Sep 27 '24 09:09 pwaller

Except if you refer to the outPath of the flake input,

Any attribute, or probably even isAttrs inputs.something, especially when not flake = false; is sufficient to force the fetchTree call. My point was that you don't need to fetch sources that aren't referenced in your evaluation (e.g. something for packages.b when you only ask for .#a and there's no dependency on b).

Switching to fixed output derivations (as also suggested in #9570 as referenced) may solve the problem for you, but at the cost of needing more tooling, and it only works for inputs where you can put flake = false;. We wouldn't want you to do "import from derivation", which would combine the sequentialization (like builtin fetchers) with the performance overhead of fixed-output fetchers, all during instantiation. While probably slower, it is a viable workaround that allows for substitution though.

roberth avatar Sep 27 '24 10:09 roberth

But yeah, back to the topic, I think we need to

  • Add a fallback on substitution
  • Make sure the attrs in question are in the lock node (ie fetchTree argument)
    • Return a lazy error when they're not
  • Later: handle the optional need for narHash gracefully in the context of virtual, "lazy" sources

roberth avatar Sep 27 '24 11:09 roberth

See #11701 which brings back the ability to substitute inputs in a safer way.

edolstra avatar Oct 15 '24 19:10 edolstra