Allow specifying settings like lfs.fetchexclude for LFS submodules
Is your feature request related to a problem?
We build software - nimbus-eth2 - which depends on big LFS repositories, but doesn't need most file in them.
Currently we use lfs.fetchexclude to exclude such big files:
# We don't need these `vendor/holesky` and `vendor/hoodi` files but
# fetching them may trigger 'This repository is over its data quota' from GitHub
GIT_SUBMODULE_CONFIG := -c lfs.fetchexclude=/public-keys/all.txt,/metadata/genesis.ssz,/parsed/parsedConsensusGenesis.json
https://github.com/status-im/nimbus-eth2/blob/v25.3.1/Makefile#L116-L118
Proposed solution
Allow for specifying GIT_SUBMODULE_CONFIG somehow for fetched submodules.
Alternative solutions
I have no other ideas.
Checklist
- [x] checked latest Nix manual (source)
- [x] checked open feature issues and pull requests for possible duplicates
Add :+1: to issues you find important.
Mhm. Not quite sure how to implement this in a clean way, but if you just want to package this repository. Maybe you could rather use the fetchgit fetcher from nixpkgs. I assume it has a hook to specify GIT_SUBMODULE_CONFIG (maybe postHook).
That's an interesting suggestion, I see it has a fetchLFS flag that is set to false by default, which I wasn't aware of:
https://github.com/NixOS/nixpkgs/blob/e5bf9b83fcce6960dccea6c87124eaeffd5c68a7/pkgs/build-support/fetchgit/default.nix#L45
Not sure how postHook would work tho.
But what I don't understand is, if someone is building our Flake with ?submodules=1# like so:
nix build 'github:status-im/nimbus-eth2?submodules=1#'
Wouldn't the repo itself be cloned with submodules regardless of at which point I add an extra fetchgit call?
Unless you're suggesting I don't use ?submodules=1 at all, but that doesn't work since we use a lot of submodules.
In a perfect world we could do:
nix build 'github:status-im/nimbus-eth2?submodules=1&lfs=0#'
But in an even more perfect world this could be set in the flake itself.
That's an interesting suggestion, I see it has a
fetchLFSflag that is set tofalseby default, which I wasn't aware of: https://github.com/NixOS/nixpkgs/blob/e5bf9b83fcce6960dccea6c87124eaeffd5c68a7/pkgs/build-support/fetchgit/default.nix#L45 Not sure howpostHookwould work tho.
postHook might be actually not the right solution. However it's also possible to just use stdenv.mkDerivation and set a fixed output hash:
outputHashAlgo = "sha256";
outputHashMode = "recursive";
outputHash = "<your-hash";
And than do git clone + all the flags you want to do inside installPhase and copy the result to $out. It might be than necessary to not have the LFS repository as a submodule, but your developers might also appreciate because they wouldn't need to download the whole code either.
Our integration with Nix is not tight enough for that to work. I won't be able to remove the submodule from the repo. This means that if the flake is being built then the submodule will be fetched as a whole.
I need to find a way to prevent the LFS files form being downloaded when Nix Flake is built.
I think ultimately we want to solve this by making all access to these source accessors lazy, but that will only work reliably when we have
- https://github.com/NixOS/nix/issues/10689
That's a bit of a stretch for now.
Would it help to have something like inputs.self.submodule."foo/bar".lfs = false;?
That seems like something that could be generally useful anyway in combination with potentially all git related flags.
Having something like inputs.self.submodule."foo/bar".lfs = false; would indeed be ideal. Though does that also assume that there would also be something like inputs.self.submodules = true;? Because having to tell users to add ?submodules=1# to URLs every time they get errors about missing submodules gets tiresome quickly :D.
something like
inputs.self.submodules = true;?
Yes, that is possible since 2.27 and I would expect that to be set. I think we should reject or warn about self.submodule without self.submodules = true;.
having to tell users to add
?submodules=1#
That would be weird, and that's the more bearable of the two. I'm not even sure that we should have URL support for per-submodule settings, because it would be complicated.
Oh really? inputs.self.submodules is available since 2.27? That's amazing, I must have missed reading release notes for that one.
I'm not even sure that we should have URL support for per-submodule settings, because it would be complicated.
No no, that's not what I was suggesting at all, and that indeed would complicate everything. I was simply referring to ensuring submodules are downloaded for the flake itself, or as it's called, self.
I see, it even introduces inputs.self.lfs = true;:
https://discourse.nixos.org/t/nix-2-27-0-released/62003
But does this setting only affect the root repo or also all submodules when inputs.self.submodules is true? I'll test it.
Okay, I'm a bit confused. I tried testing this new self.submodules key using Nix 2.28 and as far as I can tell it does not work.
I have created a force-submodules branch that includes the inputs.self.submodules = true setting:
https://github.com/jakubgs/nix-submodule-bug-repro/compare/force-submodules
But when I try to use that branch it fails the same way as it does on older versions of Nix:
> nix --version
nix (Nix) 2.28.1
> nix build 'github:jakubgs/nix-submodule-bug-repro'
> cp: cannot stat '/nix/store/9w...xr-source/dummy-submodule/README.md': No such file or directory
> nix build 'github:jakubgs/nix-submodule-bug-repro/force-submodules'
> cp: cannot stat '/nix/store/8z...iz-source/dummy-submodule/README.md': No such file or directory
Am I dumb or is this just bugged?
@roberth could you take a look? I'm not sure if I'm just doing something wrong of if it's a bug. If it is I'll open a separate issue.
Can anyone take a look? It seems to me like the inputs.self.submodules settings has no effect.
@kip93 , @lucia3e8 any ideas or thoughts?
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2025-05-04-nix-team-meeting-minutes-230/65206/1
not an expert, just lending a hand.
inputs.self.submodules has functional (aka E2E) test coverage, very unlikely that it's bugged
nonetheless, I cloned @jakubgs 's repro locally with
git clone https://github.com/jakubgs/nix-submodule-bug-repro.git
git fetch origin && git pull origin force-submodules --recurse-submodules
git submodule update --init --recursive
then ran nix run github:NixOS/nix/2.28.0 -- build --show-trace (slightly confusing - this uses my system nix to pull nix 2.28.0)
with these 2 versions of the flake.nix (had to slightly edit flake.nix to run on my aarch64-linux machine)
{
description = "nix-submodule-bug-repro";
inputs.nixpkgs.url = github:NixOS/nixpkgs/nixos-24.11;
inputs.self.submodules = true;
outputs = { self, nixpkgs }: let
pkgs = import nixpkgs { system = "x86_64-linux"; };
pkgsAarch64 = import nixpkgs { system = "aarch64-linux"; };
in {
packages."x86_64-linux".default = pkgs.stdenv.mkDerivation {
name = "nix-submodule-bug-repro";
src = self;
builder = ./builder.sh;
};
packages."aarch64-linux".default = pkgsAarch64.stdenv.mkDerivation {
name = "nix-submodule-bug-repro";
src = self;
builder = ./builder.sh;
};
};
}
with this file the package builds. if I change line 6 to inputs.self.submodules = false;, the same command returns
nix run github:NixOS/nix/2.28.0 -- build --show-trace
warning: Git tree '/host/bmc/sources/nix-submodule-bug-repro' is dirty
warning: Git tree '/host/bmc/sources/nix-submodule-bug-repro' is dirty
error: builder for '/nix/store/8l3lm2cb8r520x346hvssdgk6pr8183n-nix-submodule-bug-repro.drv' failed with exit code 1;
last 1 log lines:
> cp: cannot stat '/nix/store/01k7f88i4l08imzmkgh5a5w4rzvmav7m-source/dummy-submodule/README.md': No such file or directory
For full logs, run 'nix log /nix/store/8l3lm2cb8r520x346hvssdgk6pr8183n-nix-submodule-bug-repro.drv'
as far as I understand this is expected behavior
Okay, so you're able to confirm that for you also it fails when inputs.self.submodules = true is set. Thanks for checking.
as far as I understand this is expected behavior
Then I'm still confused, what's the point of inputs.self.submodules = true if not to fix this exactly?
Also don't know much about how the inputs.self works really, I just used what was already there for submodules and only added a small change that also allowed for inputs.self.lfs to be configured. Still, here's my 2 cents:
-
inputs.self.submodules = truewill do an all or nothing submodule clone, currently you can't filter what is pulled. Same forinputs.self.lfs, or even if you set these as query params in your consumer. -
I believe given the implementation that LFS fetching should work recursively on all submodules (though I don't think we added a test for this specific scenario). But as your issue seems to be that we're pulling too many things, I think this works.
-
For testing you need to make sure to use a remote repo as input, if you use a local clone then these
inputs.selfarguments are ignored and only what is already cloned locally is used, so if you don't have all submodules, or you did not pull LFS files, then these won't be available. -
The
inputs.self.submodule."foo/bar".lfs = false;input format may look nice, but I don't think it has much real value, since it's probably only consumers that know which parts they don't want to pull not the flake itself. -
I'm not even sure that we should have URL support for per-submodule settings, because it would be complicated.
This is I think the "proper" solution really, but also it sounds like an actual pita to implement and to have such a long URL for consumers is also far from ideal. Maybe a bit more manageable with the attribute set input representation, which I've rarely seen in the wild, but might make this somewhat usable? So instead of a long horizontal URL line we have a long vertical list of submodules/LFS configurations.
So, we could have an input like:
inputs.foobar = { type = "git"; url = "https://git.example.com/foo/bar"; submodules = true; lfs = true; submodule."a".clone = false; submodule."b/c".lfs = false; };This still needs to representable in a URL format (e.g.,
git+https://git.example.com/foo/bar?submodules=1&lfs=1&submodule_clone_a=0&submodule_lfs_b%2Fc=0), but users don't need to use that syntax if they don't want to.
Okay, so you're able to confirm that for you also it fails when
inputs.self.submodules = trueis set. Thanks for checking.as far as I understand this is expected behavior
Then I'm still confused, what's the point of
inputs.self.submodules = trueif not to fix this exactly?
ah I'm afraid I wasn't clear. with inputs.self.submodules = true the package does build
@lucia3e8 it does? Weird, because for me it does not with 2.28.3:
> nix --version
nix (Nix) 2.28.3
~/soft/nix-submodule-bug-repro force-submodules
> nix run github:NixOS/nix/2.28.0 -- build --show-trace .
error: builder for '/nix/store/66676knwjjkxd5jc40jj6p68j7fajcnv-nix-submodule-bug-repro.drv' failed with exit code 1;
last 1 log lines:
> cp: cannot stat '/nix/store/8z6yr8dcq3h4yf4rpsahvlyj8g6yd7iz-source/dummy-submodule/README.md': No such file or directory
For full logs, run:
nix log /nix/store/66676knwjjkxd5jc40jj6p68j7fajcnv-nix-submodule-bug-repro.drv
~/soft/nix-submodule-bug-repro force-submodules 11s
> nix run github:NixOS/nix/2.28.0 -- build --show-trace '.?submodules=1'
~/soft/nix-submodule-bug-repro force-submodules
> echo $?
0
It requires the ?submodules=1 trick despite inputs.self.submodules = true being set. To me it seems broken.
inputs.self.submodules = truewill do an all or nothing submodule clone, currently you can't filter what is pulled. Same forinputs.self.lfs, or even if you set these as query params in your consumer.
Yes, I tested this and it is as you say.
I believe given the implementation that LFS fetching should work recursively on all submodules (though I don't think we added a test for this specific scenario). But as your issue seems to be that we're pulling too many things, I think this works.
Indeed, we are exhausting LFS GitHub limits by fetching big files from repos that include big files via LFS not necessary for builds.
For testing you need to make sure to use a remote repo as input, if you use a local clone then these inputs.self arguments are ignored and only what is already cloned locally is used, so if you don't have all submodules, or you did not pull LFS files, then these won't be available.
Correct as well, for local builds there needs to be other tooling or documentation explaining to developer that local submodules need to be fetched or updated, but that's fine. The issue we care about is when nix build or nix run is used on our Flake with repo URL as argument, and not the local build.
This is I think the "proper" solution really, but also it sounds like an actual pita to implement and to have such a long URL for consumers is also far from ideal.
Yeah, that URL would be massive and unreadable, but I doubt it is the majority use case. I think the whole point is to set up things so that user does not have to think about those things at all, and they can use the repo with nix build or nix run without worrying about any of it, just passing the basic URL of the repo.
I found this issue after pulling my hair for hours.
Here is my flake: https://github.com/realbogart/nvim/blob/master/flake.nix
It doesn't get the submodules. Here are the commands that I have tried:
nix run github:realbogart/nvim --refresh --no-eval-cache
nix run 'github:realbogart/nvim?submodules=1' --refresh --no-eval-cache
nix run 'github:realbogart/nvim?submodules=1#' --refresh --no-eval-cache
❯ nix --version
nix (Nix) 2.28.3
Funnily enough, cloning the repo and running this locally works:
nix run '.?submodules=1' --refresh --no-eval-cache
However, I've been iterating a lot and I'm not sure if the result here is not cached in some way.
Any help is appreciated. Is this the same issue?
I think you're on the right track because I think the behavior is indeed semi-random and erratic. The whole issue of submodules with Nix flakes has been a major pain point for us trying to implement use of Nix in our repos.