nix
nix copied to clipboard
Support for Git LFS in private repositories
In Nixpkgs PRs https://github.com/NixOS/nixpkgs/pull/105998 and https://github.com/NixOS/nixpkgs/pull/113580, support for git LFS is added to the Nixpkgs fetchgit function. The problem with fetchgit, however, is that it does not properly support private repositories. Nix' builtins.fetchGit does support private repositories, but it does not seem to support git LFS.
Currently, when trying to builtins.fetchGit a repository with LFS, the following happens:
nix-repl> builtins.fetchGit {url = "[email protected]:my_company/private-lfs-repo.git"; rev = "some_rev";}
Downloading some/lfs/file (123 KB)
Error downloading object: some/lfs/file (a123456): Smudge error: Error downloading some/lfs/file (some_rev): batch request: missing protocol: ""
Errors logged to /home/my-user/nix/gitv2/xxx/lfs/logs/20210309T095658.11111111.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
error: program 'git' failed with exit code 128
Ideally, it should be possible to builtins.fetchGit the repo either with or without downloading the LFS files. In one use case, the LFS files are used for non-vital things, like tests or documentation. The nix derivations do not depend on those files. Not downloading the LFS files would save space. In another use case, the LFS files are needed to build the derivations, and should therefore be downloaded.
It is possible to export GIT_LFS_SKIP_SMUDGE=1 to accomplish the first use case (i.e. fetch private LFS repository without actually downloading the LFS files), but it would be be much nicer to have it as an option of the builtins.fetchGit function.
#4635 has the potential to fix the first use case by default
the LFS files are used for non-vital things, like tests or documentation.
Did you configure LFS globally in your git user config? I now realize git global user config may affect more places than what I've found with my testing.
Did you configure LFS globally in your git user config?
Yes, the following section is present in ~/.gitconfig:
[filter "lfs"]
clean = git-lfs clean -- %f
smudge = git-lfs smudge -- %f
process = git-lfs filter-process
required = true
I marked this as stale due to inactivity. → More info
For some projects I am working on LFS is crucial. I hope this gets solved soon.
I marked this as stale due to inactivity. → More info
Still relevant.
This error also pops up when using a git repository that uses LFS as a flake input, or seemingly even just by having a flake in a repository with LFS (c.f. https://github.com/NixOS/nixpkgs/issues/137998).
I didn't expect it, but export GIT_LFS_SKIP_SMUDGE=1 seems to also workaround the problem with flakes, as long as you don't care about the LFS files.
but .... what if you do care about the LFS files .... 🥲 🥲 🥲 🥲 🥲 🥲 🥲
I think the plan for this would be
- [ ] Merge the libfetchers changes from #6530
- [ ] Implement LFS support in the fetcher. Could be smudge filter-based + whitelist of smudge filters, or something more hardcoded. (We don't want to allow general smudge support because that's impure, but could be a convenient implementation strategy - or not)
- [ ] Add a parameter to the git fetcher. I think we'll eventually want three modes
- Lazy LFS: fetch any LFS file when it is needed. This will tend to be sequential. Most versatile mode, and a sensible default.
- Eager LFS: fetch all LFS files simultaneously. This will be faster when you know you need all LFS files.
- No LFS: quick, even if we're copying the whole flake, which we may have to do until the libexpr part of #6530 is figured out. Alternatively, this mode could be a filter of which files to ignore / fetch eagerly / fetch lazily.
- [ ] Implement the double fetching protocol where we fetch and load flake.nix once to figure out the fetch parameters, and then fetch and load again if needed
Gitlab forces free users now to use LFS in many cases, so I guess this will become a lot more relevant.
AFAIU, this is unspecific to private repos:
builtins.fetchGit {
url = "https://huggingface.co/openlm-research/open_llama_3b";
rev = "141067009124b9c0aea62c76b3eb952174864057";
};
...fails in the same way:
...
Downloading pytorch_model.bin (6.9 GB)
Error downloading object: pytorch_model.bin (9ffd42d): Smudge error: Error downloading pytorch_model.bin (9ffd42dc58c4f49154e98bc7796306fde40febef278e99636a240a731d626a4a): batch request: missing protocol: ""
Errors logged to '/home/.../.cache/nix/gitv3/14avjqj1kcsaj6025lqgbr5r4yz680zmj1xzppc13cgxx12i8dj3/lfs/logs/20231227T021723.995860432.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: pytorch_model.bin: smudge filter lfs failed
error:
… while calling the 'fetchGit' builtin
...
@SomeoneSerge for huggingface this worked for me:
fetchgit { # from `pkgs`, not `builtins`, may not matter?
url = "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2";
rev = "b70aa86578567ba3301b21c8a27bea4e8f6d6d61";
hash = "sha256-IAe/tHFB7yqFRF5aRojkNCD8TbKj8XQMt6eEyPmr4HU=";
fetchLFS = true;
}
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/flake-lfs-input/40184/2
Is there currently a workaround for fetching nix flakes input with lfs?