niv
niv copied to clipboard
Fetching a github repo with submodules and checksum
Hi,
I have the following use case, but it seems like niv
(currently) does not support it.
Here is the scenario:
- I want to add https://github.com/darktable-org/darktable to niv
- but only adding it as
type=tarball
does not allow me to fetch submodules, therefore thesha256
is incorrect (differs from the one with submodules) - using
type=git
and the unstable nix version as described in #58 would work, but as I understand it, that means thatniv update
is a no-op on thejson
and everytime I import fromsources.nix
it will download the repo again (which takes a lot time)
- but only adding it as
What I want is a json
entry that also has the sha256 set, such that the evaluation of sources.nix
does not download the huge git repo everytime.
As far as I see, the problem is that the builtins.fetchgit
does not support adding a sha256
, so this would require fetchFromGitHub
which is not builtin. I could change my sources.nix to add this, which is nice, but what I don't get is that niv update
works and updates the package...
Can you confirm my observations? What would be a good way to add this behavior to the code? It seems like we could add a case on type
in:
https://github.com/nmattia/niv/blob/f73bf8d584148677b01859677a63191c31911eae/src/Niv/Cli.hs#L346-L349
and for example use nix-prefetch-git
for a github
type?
i just commented on a similar matter here: https://github.com/nmattia/niv/issues/214
The whole thing with submodules support would also be very interesting for me.
(Sorry for the late reply, I was out and still catching up on notifications)
that means that niv update is a no-op on the json
No, it should still update the rev
if I understand the setup correctly. It's treated as a Git repo which should work fine.
it will download the repo again (which takes a lot time)
What exactly downloads the repo?
the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.
~That's a problem, because fetchFromGitHub
does not support submodules I think~ Looks like fetchFromGitHub
supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?
hey @nmattia, i wrote my comments a bit with a hot needle.
it somehow seems that builtins.fetchgit
does not get stuff from the store even if it's available there.
on an offline machine with everything that my repo gets from niv precached in the store (which i did by calculating the closure of a niv attrset and storing them all in an iso file), nix still tries to download stuff from the internet.
(Sorry for the late reply, I was out and still catching up on notifications)
No worries!
that means that niv update is a no-op on the json
No, it should still update the
rev
if I understand the setup correctly. It's treated as a Git repo which should work fine. Hmm at least as far as I remember, it did not update.it will download the repo again (which takes a lot time)
What exactly downloads the repo?
Using the sources.nix
attribute of the dependency (darktable
in the example above) in e.g. my NixOS config
the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.
~That's a problem, because
fetchFromGitHub
does not support submodules I think~ Looks likefetchFromGitHub
supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?
I think I didn't put that in the right words :) Using fetchFromGitHub
does work indeed, but then we would also need to change the type=git
fetcher away from fetchgit
or introduce another type?
Hey guys, quick update. I didn't drop the ball, but I opened a pretty big can of worm when I started working on https://github.com/nmattia/niv/issues/111 (implementation here: https://github.com/nmattia/niv/pull/258).
NOTE: ok this is longer than I thought but writing this down made it a bit clearer in my head. Feedback very much welcome.
I'll start with a quick recap of how niv works and what it does; then I'll give a quick overview of potential solutions.
There are two sides of niv: one is the "Nix evaluation" that's provided with sources.nix
and the other one is the update, with niv update
. The "Nix evaluation" tries to pick the best fetcher possible (for instance, fetchGit
should be used for private repos because fetchFromGitHub
just won't work (in any way practical)). The update part hits the GitHub API, pings git repos and calls nix-prefetch-url
to find information about the sources like: the default branch (if none is provided), the latest revision on the branch and potentially the sha256.
Now, let's focus on git repositories (including GitHub projects). What's the best fetcher? Well, that depends on three factors: Is the repository public on GitHub? Does the repository require (SSH-)authentication for a git clone
? Does the repo have submodules? Let's have a look:
note: I'll talk about fetchgit
, fetchGit
and fetchzip
because fetchFromGitHub
uses fetchgit
with submodules and fetchzip
without. The fetchzip
variant works by downloading a tarball from GitHub.
- The repository is hosted on GitHub, is public, and has no submodules: Any fetcher will work (
fetchgit
,fetchGit
,fetchzip
). Bothfetchgit
andfetchzip
are good because they are fast (fixed-output derivation) and run at build-time.fetchGit
will work but (1) it will regularly ping the upstream repo to check for changes and (2) will need extra settings when run inside arestrict-eval
evaluation. - The repository is hosted on GitHub, is public, and has submodules:
fetchzip
is out of the question because GitHub does not offer tarballs that include submodules.fetchgit
will work fine;fetchGit
will work in recent versions of Nix with the same caveats as above (regular polling + eval-time considerations). - The repository is private: whether on GitHub or not,
fetchzip
won't work (without leaking theGITHUB_TOKEN
which would be a pain). Thefetchgit
way of cloning repos is so not user friendly that I'll just say "it doesn't work". That leaves us withfetchGit
; same caveats as above (regular polling + eval-time considerations) and, in case the repo has submodules, it must use a recent version of Nix. - The repository is public but not hosted on GitHub: Both
fetchgit
andfetchGit
will work, butfetchzip
won't (because there's no one providing a tarball). Note the caveats mentioned above forfetchGit
(regular polling + eval-time considerations + recent Nix for submodules).
So basically, if your repo is public and on GitHub, fetchzip
and fetchgit
are best; fetchzip
is a bit cleaner (just a tarball download) but for consistency fetchgit
may be better (it also works if your repo has submodules). If your repo is not on GitHub but is public, use fetchgit
. If your repo is private, use fetchGit
but caveats (regular polling + eval-time considerations + recent Nix for submodules). Here I'd like to point out that most users just don't care about the fetchGit
caveats, so maybe niv
should just use fetchGit
by default with an option to fallback to fetchgit
for public repos.
Ok, now let's figure out how the update part of niv can figure out the latest rev and default branch. There's basically three ways: for one, you can query the GitHub API. Alternatively, you can use git ls-remote
or git clone
. Using git ls-remote
is always preferable to git clone
because it contains the info we need (latest revision and/or default branch) but doesn't involve copying any more info (super slow for big repos). So the situation is like this:
- If the repository is a public GitHub project, then both the GitHub API and and
git ls-remote
will work fine; however GitHub does some rate limiting if you're not authenticated (i.e. no GITHUB_TOKEN) so let's just say thatgit ls-remote
is better. - If the repository is a private GitHub project, then the GitHub API will work, but you'll need to be authenticated (i.e. have a GITHUB_TOKEN).
git ls-remote
will work, so...git ls-remote
is preferable. - If the repository is not on GitHub, then... use
git ls-remote
.
In some cases where you need to clone the repo anyway (see below) then you might as well use git clone
directly, but that's just complicating an already complicated story. So instead, let's just say niv
should alwasy fetch the latest revision and default branch with git ls-remote
.
Finally, in the "fetcher" section above I said that fetchgit
needs a sha256
. So the question is: how does one get the sha of a repo? This also relates to https://github.com/nmattia/niv/issues/111 because whenever we get the sha of a repo, we can always get the last commit date.
- Repository is public and on GitHub and doesn't have submodules: Two solutions, using the GitHub API (downloading tarball for sha256 and querying commit info for the date) or performing a git clone. A clone can take a long, long time (try getting a shallow clone of https://github.com/torvalds/linux) so the GitHub API is preferable.
- Repository is private and on GitHub and doesn't have submodules: Same as above, although the user needs to be authenticated (GITHUB_TOKEN) for using the GitHub API. So here the best way is probably to start the git clone and instruct the user to set a GITHUB_TOKEN if it's taking too long.
- Repository is not on GitHub: a clone is needed here.
Basically: use the GitHub API as much as possible, but fall back to git clone when you can't. When the clone is taking too long, tell the user they could use GitHub instead (or, if only the date is needed, then do something like niv update --no-date
).
There's a few other details (how does niv figure out if a repo is on GitHub, private, has submodules?) but that's more of a niv add
question (then we can just store the info in sources.json). Still should spend some time thinking about it.