nix icon indicating copy to clipboard operation
nix copied to clipboard

Reuse input lock files

Open roberth opened this issue 2 years ago • 22 comments

(was: Solve dev(/test) dependencies and bloated, diverging lock files, by reconsidering the flattened lock file)

Is your feature request related to a problem? Please describe.

We've previously identified issues with the way the lock file currently works.

  • Irrelevant development dependencies are added to the lock files of all dependent flakes
    • https://github.com/NixOS/nix/issues/6124
  • Mututally dependent flakes cause diverging lock files as each lock file update incorporates a new version with a chain of irrelevant dependencies.
    • issue?
  • Some users just complain about the size of their lock file. Partly pedantic, but they're actually kind of right:
  • Ideally, lock file changes are reviewable, so that nothing weird creeps in through contributions that have such updates.

Describe the solution you'd like

We may consider two scenarios; one with follows and one without. Let's start without follows.

Observing that we do not exploit the apparent benefit of knowing all inputs upfront, we may consider loading the required parts of dependency lockfiles on demand.

Example scenario

  • Dependencies . -> foo -> nixpkgs
  • No follows.

Evaluation order, current situation:

  • need packages.default
  • need inputs.foo.packages.foo
  • read foo details from ./flake.lock
  • fetch foo
  • need foo.inputs.nixpkgs
  • read nixpkgs_1 details from local lock file
  • fetch nixpkgs_1

Same evaluation, partial lock file

  • need packages.default
  • need inputs.foo.packages.foo
  • read foo details from local lock file
  • fetch foo
  • need foo.inputs.nixpkgs
  • read nixpkgs details from foo/flake.lock
  • fetch nixpkgs

As you can see, the fetching characteristics remain the same.

Let's reintroduce follows, and first consider the example:

inputs.foo.nixpkgs.follows = "nixpkgs";

In other words, this injects the local nixpkgs dependency into foo. The lazy fetching behavior remains the same. This injection should be handled by call-flake.nix and does not require changes to the lock file.

Now consider the opposite follows:

inputs.nixpkgs.follows = "foo.nixpkgs";

In this case, if we were not to include a copy of foo.nixpkgs details in the local lock file, we'd have a slightly worse lazy fetching behavior, as nixpkgs would only be fetchable after foo's lock has been fetched. For this case, we do want "flat lock file" behavior, but limited to this dependency. The rule for which transitive dependencies to include in the lock file might be as simple as direct inputs + inputs referenced in follows.

This suggests the following implementation strategy:

  • When generating a lock, apply the above rule to determine which flakes to traverse and include in the lock file
  • When calling a flake, call it such that its inputs are those from its own lock // those set in the current lock
    • https://github.com/NixOS/nix/pull/6530 already adds an overrides parameter in call-flake.nix that appears to enable this. The missing bit is that some/many nodes would be created by reinvoking call-flake.nix rather than sharing the node identifiers. Node identifiers are local to each call-flake.nix call. i.e. overrides is a prerequisite for calling dependency locks.

Describe alternatives you've considered

Mark development dependencies explicitly.

  • This is more configuration which is generally not good.
  • This adds more corner cases and error messages.
  • This does not solve the problem of having mutually recursive development dependencies. Note that we can barely manage to make builds not mutually recursive, but flakes encompass more than builds, so expecting flakes not to be mutually recursive is not realistic.

Additional context

Priorities

Add :+1: to issues you find important.

roberth avatar Feb 01 '23 14:02 roberth

I quite like the syntax @mkenigs used in #4004 to talk about overrides

A (override B.C.D.E) -> B (override C.D)-> C -> D -> E

It is also a useful vehicle for talking about the locking algorithm, which seems that it would recurse into dependencies until the overrides are (), and would only record the inputs that are not already locked or overridden somewhere.

Similarly, it would probably reflect the overrides parameter in call-flake.nix. This parameter does not exist in master yet, and lazy-trees defines it as overrides on sources. It would have to become a structure representing overrides containing invoked flakes rather than source inputs, and it perhaps need a somewhat richer structure to represent deep overrides.

roberth avatar Jun 12 '23 13:06 roberth

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-07-31-nix-team-meeting-minutes-76/31486/1

nixos-discourse avatar Aug 09 '23 05:08 nixos-discourse

@tomberek My thought on the new lock file format: It should store mappings from unlocked input paths to locked refs, and nothing more, and it shouldn't store input paths that are already locked by the lock files of input flakes.

I.e. something like

{
  "version": 8,
  "locks": {
    "nixpkgs": {
      "original": { ... original ref from flake.nix ... }
      "locked": { ... the immutable lock ... }
    },
    "nix": {
      "original": { ... original ref from flake.nix ... }
      "locked": { ... the immutable lock ... }
   },
   # note: since the "nix" input has its own lock file, we don't store the locks of the "nix" input here, unless there are overrides
   # e.g. if there is a (non-follows) override "nix.nixpkgs", or if "nix" does not have a lock file:
   "nix.nixpkgs": {
     "original": ...,
     "locked": ...
   };
  }
} 

Thus, if we encounter an unlocked input a.b.c in flake.nix, we look up locks."a.b.c" in flake.lock. If it doesn't exist (or if its "original" ref doesn't match), we resolve the unlocked ref and add/replace the lock file entry a.b.c.

We don't create lock file entries for "follows" inputs (e.g. if "a.b" follows "c.d", we don't create a lock file entry for "a.b"). In general, we don't store any information in the lock file that is already in flake.nix (except for the "original" refs, which are needed to decide whether we need to update a lock file entry if the corresponding unlocked entry has changed).

Note that the above replaces the graph structure of the current lock file format with a simple tree.

edolstra avatar Aug 24 '23 15:08 edolstra

Note that the above replaces the graph structure of the current lock file format with a simple tree.

  • current graph: everything, ready to for a mapAttrs and fix without doing a recursive call-flake.nix - only one set of nodes
  • proposed tree: just the impure flakerefs in the current flake's inputs. call-flake.nix does something similar to the current call-flake.nix, but does so for individual flakes with their inputs, without the unnecessary concept of nodes.

To make follows possible, call-flake.nix needs to accept input overrides as a parameter. Without such a parameter, an input A would only see its own inputs, because call-flake.nix reinvoked itself with A's lock file instead of the user's lock file.

roberth avatar Aug 24 '23 15:08 roberth

From NixCon, in discussion with @lheckemann , @infinisil , @roberth

  • established an example where behavior differs. When an immediate dependency moves or is no longer available (eg: authorization), flakes it depends on are not accessible. This means something like inputs.A.inputs.B.inputs.C.packages.SYSTEM.thing would fail, where with a full-graph would allow for accessing C's outputs even if B was unavailable. So in this particular sense we lose this style of laziness. This is an edge-case and likely not worth basing decision-making on.
  • Simpler implementation is highly preferable.
  • Need a strategy to still support older lockfiles? (for how long? until stabilization?)
  • Need for a cache of resolved lockfile information
    • can we use existing git cache?
    • reuse eval cache sqlite?
    • another new cache?
  • include a strict-lock option?
  • follows and overrides information now only resides in the flake.nix (split source of truth to define the graph, but it ensures that there is only one place the information resides)

tomberek avatar Sep 19 '23 21:09 tomberek

One potential consideration for the new format:

After @lheckemann's talk I was thinking about a locking algorithm that would handle something like this: flake A's flake.nix:

inputs.B.url = "github...";
inputs.B.useLock = false;

inputs.C.url = "github...";
inputs.C.useLock = false;

Suppose B and C both depend on nixpkgs. Then locking flake A would store a single revision for nixpkgs and store it in flake A's flake.lock. This is very similar to how language dependency managers like cargo make a distinction between lib and binary dependencies.

It might be possible to express the same thing in the current format, by doing e.g.:

inputs.nixpkgs = "github...";
inputs.B.inputs.nixpkgs.follows = "nixpkgs";
inputs.C.inputs.nixpkgs.follows = "nixpkgs";

But I think that's more difficult, and in practice it seems people often don't add all of the follows they could. It seems like useLock might be a more likely way to cut down on lockfile bloat.

Implementing useLock would require storing the entirety of the dependency graph of the child in the parent's flake.lock. That's possible in the current format. It's not possible with the format currently proposed in this issue, although I think it could be a natural extension (i.e. useLock could imply the strict-lock option @tomberek mentioned)

mkenigs avatar Sep 19 '23 23:09 mkenigs

It should in fact be possible with the proposed format, and would be represented on the lockfile side in much the same way as the follows approach.

Note also that ignoring downstream locks would only result in the same nixpkgs being used if the downstream inputs use the same nixpkgs input spec. I hope we can also find an approach (though it need not be implemented inside Nix) that works for replacing references that use different branches, or even different repositories or input types.

lheckemann avatar Sep 20 '23 09:09 lheckemann

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-09-18-nix-team-meeting-minutes-87/33194/7

nixos-discourse avatar Sep 20 '23 11:09 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-09-18-nix-team-meeting-minutes-87/33194/8

nixos-discourse avatar Sep 20 '23 11:09 nixos-discourse

would be represented on the lockfile side in much the same way as the follows approach.

Yes, except follows doesn't have to be in the lockfile at all. We expect that the inputs declarations in flake.nix can be the single source of truth. Same for features that behave like follows.

roberth avatar Sep 20 '23 13:09 roberth

This would have to be in the lockfile I think, because it's not present in flake.nix as either a follows or an override (it's a bit more like an implicit follows). Adapting what @edolstra has above, it could be expressed as:

  "locks": {
   "B.nixpkgs": {
     "original": ...,
     "locked": ...
   };
   "C.nixpkgs": {
     "original": ...,
     "locked": ...
   };
  };

That would result in more duplicated information than the current format.

mkenigs avatar Sep 20 '23 18:09 mkenigs

(it's a bit more like an implicit follows).

An implicit follows is an override just as well.


Also responding to your previous comment:

Implementing useLock would require storing the entirety of the dependency graph of the child in the parent's flake.lock. That's possible in the current format. It's not possible with the format currently proposed in this issue

It is actually possible, by storing the whole graph as overrides. useLock would cause call-flake.nix to look for, and apply those overrides.

So such a feature does not conflict with the proposal to reuse input lock files without copying.

although I think it could be a natural extension (i.e. useLock could imply the strict-lock option @tomberek mentioned)

Yes, except applied to part of the dependency tree. It's even more similar to what we do for inputs that have no lock file.

For useLock to work well, it would be nice to have inputs that are "fixed" or "dev only" or "private" - ones that aren't affected by useLock because that wouldn't achieve anything.

This can all be added later though.

roberth avatar Sep 20 '23 21:09 roberth

It is actually possible, by storing the whole graph as overrides. useLock would cause call-flake.nix to look for, and apply those overrides.

I think I may be misunderstanding:

proposed tree: just the impure flakerefs in the current flake's inputs

Which I took to mean inputs/overrides/follows mentioned in flake.nix. But if storing the whole graph as overrides, including inputs that aren't in any way present in flake.nix, is possible, then that's great.

This can all be added later though.

👍 for sure! If I do take a crack at implementing useLock, should I do that using the existing format, or do the changes mentioned in this issue need to happen first?

mkenigs avatar Sep 20 '23 21:09 mkenigs

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/sane-stable-stateless-nixos-setup/35568/3

nixos-discourse avatar Nov 16 '23 12:11 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flake-lock-inadequate/37753/12

nixos-discourse avatar Jan 14 '24 23:01 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flake-lock-inadequate/37753/13

nixos-discourse avatar Jan 15 '24 08:01 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-flake-inputs-not-lazy/25463/11

nixos-discourse avatar Jan 28 '24 18:01 nixos-discourse

Another issue with the current lock file is that it causes nix to needlessly re-evaluates equivalent inputs.

For example if we have two inputs A and B that both depend on the same revision of nixpkgs (equivalent narHash), then nixpkgs will be evaluated twice. This can be worked around by using B.inputs.nixpkgs.follows = "A/nixpkgs".

However if you want to take advantage of binary caching, then using follows is not an option. If A.nixpkgs and B.nixpkgs at some point diverge because A and B aren't always updated at the same time, then using follows on B.nixpkgs will cause it to miss the binary cache.

So if you want to take advantage of binary caching, there's no practical way to avoid re-evaluating inputs.

szlend avatar Feb 27 '24 19:02 szlend

I looked into fixing the performance implications of this (without touching the lock file format) in #10218.

szlend avatar Mar 14 '24 09:03 szlend

@szlend Thank you for looking into this.

I think the solution in #10218 and in this proposal are somewhat incompatible, because #10218 would require fetching all lock files all the time. It seems that laziness could be recovered with #8025 though, so I'm not too worried.

roberth avatar Mar 21 '24 15:03 roberth

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/monorepos-dont-map-to-our-social-structure/44162/18

nixos-discourse avatar Apr 26 '24 19:04 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/im-not-convinced-flake-inputs-should-exist/50636/24

nixos-discourse avatar Aug 21 '24 09:08 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-are-flake-dependencies-fetched/52638/8

nixos-discourse avatar Sep 25 '24 11:09 nixos-discourse