nix fetchTree: shallow git fetching by default

Motivation: make git fetching more efficient for most repos by default

Feb 17 '24 13:02 DavHau

Team discussion:

Idea approved.
This needs a release notes entry.
The VM tests should be changed to a functional test since those are much faster.
Are there any potential performance issues with switching to shallow fetching? E.g. is there a possibility that incremental fetching doesn't work as well if the client doesn't have the whole history?

Feb 23 '24 14:02 edolstra

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-02-03-nix-team-meeting-127/40309/1

Feb 26 '24 08:02 nixos-discourse

This needs a release notes entry.

Done

The VM tests should be changed to a functional test since those are much faster.

This wouldn't make much sense here, as basically all code paths related to shallow cloning and caching are not triggered when a local git repository is fetched. Only for remote repos this test makes sense, but this would be hard to integrate with the functional tests, as it needs a git server which is not trivial to set up without nixos modules.

Are there any potential performance issues with switching to shallow fetching? E.g. is there a possibility that incremental fetching doesn't work as well if the client doesn't have the whole history?

Yes, generally it is possible to have scenarios where the lack of incrementality of shallow fetching leads to more overall network traffic and disk I/O, like for example when fetching many different revisions of the same repo. Though it is worth mentioning that:

the network traffic for shallow fetching always scales linearly with the amount of different revisions fetched, which seems bearable, while full cloning can get totally out of control on repos with a large history. For some ecosystems it is common for the repository history to be hundreds of times larger than a single checkout. Some repositories have a history of several GB. Using shallow cloning as a default strategy avoids hitting these worst case scenarios regularly.
Non-incremental fetching currently already seems to be the preferred choice overall. There is probably a reason why github flake inputs are fetched via the tarball API and not using git full cloning. The performance of fetching the full nixpkgs repo wouid probably not be great on a raspberry pi.

I believe for now, this is the better default, but thinking long term it might be better if the fetching strategy would be determined by nix automatically with possibility of overriding the behavior via nix.conf. The nix expression doesn't seem to be the best place for these options, as the author usually doesn't have enough information to make the best decision.

Feb 28 '24 06:02 DavHau

Generally I believe this is ready.

@edolstra If I'm missing something that would allow me to add a meaningful functional test instead of the nixos tests, please let me know.

Right know, I don't see a good way, as fetchTree doesn't return any attributes which would indicate if a repo was fetched shallowly or not (which is great BTW as it simplifies future upgrade paths). So the only way left to determine if a clone was shallow is looking at the cached git tree, but there is no cache when fetching a local repo.

Mar 09 '24 06:03 DavHau

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-06-03-nix-team-meeting-minutes-149/46582/1

Jun 06 '24 08:06 nixos-discourse

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/handling-git-submodules-in-flakes-from-nix-2-18-to-2-22-nar-hash-mismatch-issues/45118/5

Jul 18 '24 05:07 nixos-discourse