nix
nix copied to clipboard
Add optional `date` argument to `builtins.fetchGit`
This allows users to specify an absolute or relative date (basically, anything that git accepts as a date specification) when fetching a repository.
The motivation for this change is to enable support for incremental builds for Haskell packages for this Nixpkgs pull request:
https://github.com/NixOS/nixpkgs/pull/204020
… inspired by this blog post:
https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/
Keep in mind that this new feature can power incremental builds for other package managers, too. There is not much that is Haskell-specific about this feature.
The basic idea is that instead of Nix doing a full build for a package, we split every build into two builds:
-
A full build at an older point in time
e.g. a daily or weekly time boundary
-
An incremental build relative to the last full build
This incremental build reuses the build products left over from the most recent full build.
In order to do this, though, we need a way to "snap" a package's git source input to an earlier point in time (e.g. a daily boundary or weekly boundary). This would allow multiple incremental builds to share the same full rebuild if they snap to the same time boundary.
The two main approaches I considered were:
-
Approach 1 (this PR)
Patch Nix to add a
dateargument tobuiltins.fetchGit -
Approach 2
-
Patch
nix-prefetch-gitto support a new--dateoption -
Disable the sandbox
-
Run
nix-prefetch-gitat evaluation time using import-from-derivation to fetch and rehash the repository at an older point in time
-
Approach 1 seemed the more desirable of the two.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nixpkgs-support-for-incremental-haskell-builds/24115/3
~~Couldn't another approach be to build builtins.unsafeDiscardOutputDependency in a first pass, which presumably would be the inputs, which presumably are less volatile in the target scenario?~~
No. → https://felixspringer.xyz/homepage/blog/incrementalHaskellBuildsWithNix had a concise answer for me.
I'm not sure what is the etiquette or protocol to ping for a review for this repository
go to the nix dev matrix channel, find someone there (at least it was for nixpkgs)
Nix team meeting notes
That is a bit ad-hoc, and highly impure. It's a very clever abuse of Nix, but an abuse nonetheless, so we wouldn't want this to go upstream, especially since it can be worked around with a bit of infrastructure (like setting-up a cron job to update a tag/branch that Nix can pull from)
Also the team reminded me we don't want to give out "false hope" by accepting experimental features we never intend to stabilize, which was what I was suggesting. Fair enough.
@thufschmitt: The corresponding blog post explains the problem with using a cron job to update the lockfile periodically:
However, there are a few issues with this approach:
It only works well for short-lived pull requests
In other words, if you update the revision used for the full build once a day then typically only pull requests that are less than a day old will benefit from incremental builds.
Specifically, what we’d really like is “branch-local” incremental builds. In other words if a longer-lived development branch were to deposit a few commits a day we’d like there to be a full rebuild once a day on that branch so that incremental builds against the tip of that development branch remain snappy.
It pollutes the git history
If you bump the lockfile, say, once per day then that’s one junk commit that you’ve added to your git history every day.
It’s difficult to open source any useful automation around this
If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-05-26-nix-team-meeting-minutes-58/28572/1
highly impure
I also dislike this feature, but I think this assertion is unfair (I think @Gabriella439 already explained this somewhere, but I can't find the comment anymore): rev fixes the entire state of the git repository we want including the commit history and all its associated dates. Given rev and date, fetchGit should then be pure. Why does the Nix team think this is impure? I could not find a specific reason in the meeting minutes unfortunately.
Also, even if this were impure, I don't see how that would argue against including this. builtins.fetchGit already permits impurity. For example, you can specify a branch without a revision, which is impure, and that's currently allowed.
The impurity wasn't about the fetching itself (which is actually pure if you specify an immutable rev as the baseline), but the use-case.
If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.
I don't really buy this argument. Every non-trivial project will have some CI system that could be used to update a tag every so often. It is indeed slightly more complicated, but not to the point of being a blocker if this approach is indeed worth it
We ran into another situation at work where this support would have been really helpful:
We currently use Nix to build a snapshot of our database with all migrations up to that point having been already run. However, this snapshot build is somewhat expensive and people are adding new migrations multiple times a day.
The ideal case would be if we could have this expensive build always "snap" to the last commit before the last UTC midnight so that way it would only be rebuilt once per day so that we'd get a lot of cache reuse from that one build (even as new migrations were still being merged that day). If we were to merge this PR we could implement that functionality using something like this truncate function powered by the extended builtins.fetchGit:
{ lib }:
# "Truncate" a `git` source repository to the nearest time interval (specified
# in seconds). For example `truncate { duration = 60; src = …; }` will
# return the last commit before the last minute boundary.
{ interval
, src
, extraFetchGitArgs ? { }
}:
let
srcAttributes =
if lib.isAttrs src
then src
else { url = src; };
url = srcAttributes.url or null;
name = srcAttributes.name or null;
submodules = srcAttributes.fetchSubmodules or null;
arguments = {
${ if name == null then null else "name" } = name;
${ if url == null then null else "url" } = url;
${ if submodules == null then null else "submodules" } = submodules;
};
startingTime =
if srcAttributes ? rev
&& srcAttributes.rev != "0000000000000000000000000000000000000000"
then
let
startingRepository = builtins.fetchGit (arguments // {
inherit (srcAttributes) rev;
});
in
startingRepository.lastModified
else
builtins.currentTime;
in
builtins.fetchGit (arguments // {
date = "${toString ((startingTime / interval) * interval)}";
} // extraFetchGitArgs)
Then with that truncate function we would achieve the desired daily snapshot behavior like this:
lib.truncate {
src = …;
interval = 24 * 60 * 60;
}
Now, this PR needs to be fixed to work against the latest changes on master, which I'm more than happy to do myself, but I don't want to do that work if upstream's not going to merge this so I'd like to ask them to reconsider if this can be merged.
Alternatively, if they know a better way to accomplish what we're trying to do I would welcome that, but "setting up a cron job that sets a tag or adds a commit every day to our git history" is not really an acceptable solution for the reasons I mentioned in https://github.com/NixOS/nix/pull/7362#issuecomment-1565663427
I should probably also add one more thing that might not be obvious based on the previous discussion:
-
If you have a cron job that updates a
tagthat doesn't work because then stale branches (ones with a merge base preceding the tag) will pull in migrations that they shouldn't have on their branchIn other words, by depending on a moving tag you're depending on an unlocked (impure) source input with all of the issues that entails.
-
You can fix that impurity by adding junk commits daily to the trunk branch (instead of using tags) but that comes with the obvious downside of junk commits
… and also still doesn't completely work for long-lived branches (with commits spanning multiple days), since those branches will be missing branch-local commits to update their snapshots. The only way to get a junk-commit-based solution that has parity with the
dateargument tobuiltins.fetchGitis to add daily junk commits to all development branches (which will generate merge conflicts!).
The date argument gives the best of both worlds because it preserves purity but still doesn't require setting up machinery to add noisy commits to every long-lived development branch.
Incremental builds would be a game changer for every CI running Nix, saving time and resources. As discussed above, this PR does not add any more impurities that are not already there. I'm echoing Gabriella's request to please reconsider merging this.
I wonder if this would be easier to support if it only applies when you already have a rev:
builtins.fetchGit {
url = "some-url";
rev = "abb08192ed875ef73fa66029994aa2f6700befd0"
date = "1 day before";
}
It does mean that (builtins.fetchGit {rev = some-rev; ...}).rev != some-rev, and then date is seen more like a deterministic modifier than as an identifier that changes.
The current PR uses ref to first obtain a rev, and then modify that to roll back time.
builtins.fetchGit {
url = "some-url";
ref = "SOMEBRANCH"
date = "1 day before";
}
I leaning toward accepting either as-is, or via restricting the behavior to when rev is provided.
Timezones
Another random question: is the timestamp specification sensitive to things like timezones and locales? If the someone's location impacted this, it would be much harder to consider the date to be a deterministic modifier.
Hrm....
[tom@tframe:~/nix]$ ( export TZ=CET ; git rev-list --before 'noon' -1 HEAD )
b7d80d002f8a3885e9f98b5944e283c2b7f861e6
[tom@tframe:~/nix]$ ( export TZ=UTC ; git rev-list --before 'noon' -1 HEAD )
c45859864742dea6cacf84b83b577c3be0744bf1
[tom@tframe:~/nix]$ ( export TZ=WET ; git rev-list --before 'noon' -1 HEAD )
2ab93fd5fda3f61f6b1560db7da21a34dbd13b7d
If we do this, we need to either ensure a consistent TZ or not allow some of the special forms (https://github.com/git/git/blob/master/date.c#L1197-L1204). "yesterday" is just an offset, but "tea" is a specific time of day most conducive to hot beverages, very location dependent.
I'd also be okay with always requiring the --impure flag to use this feature
~~What about: warn if such forms are used in pure eval, and assume UTC; subject to local time in impure eval~~
No, such keywords should be forbidden in pure eval
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2024-08-21-nix-team-meeting-minutes-171/50950/1
What is the current status of this feature?
From the 2024-08-21 nix team meeting minutes :
What would be the right primitives for this?
Idea: perhaps
fetchGit/fetchTreecould support this in a pure way.inputs.a = { type = "git"; ref = "main"; } # lock: rev = ... inputs.a-base = { type = "git"; rev = inputs.a.rev; roundDown = "1 day"; }In this example, the roundDown attribute would implement the desired behavior and could be implemented
... which gives hope?