niv icon indicating copy to clipboard operation
niv copied to clipboard

Fetching a github repo with submodules and checksum

Open markus1189 opened this issue 4 years ago • 5 comments

Hi,

I have the following use case, but it seems like niv (currently) does not support it.

Here is the scenario:

  • I want to add https://github.com/darktable-org/darktable to niv
    • but only adding it as type=tarball does not allow me to fetch submodules, therefore the sha256 is incorrect (differs from the one with submodules)
    • using type=git and the unstable nix version as described in #58 would work, but as I understand it, that means that niv update is a no-op on the json and everytime I import from sources.nix it will download the repo again (which takes a lot time)

What I want is a json entry that also has the sha256 set, such that the evaluation of sources.nix does not download the huge git repo everytime.

As far as I see, the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin. I could change my sources.nix to add this, which is nice, but what I don't get is that niv update works and updates the package...

Can you confirm my observations? What would be a good way to add this behavior to the code? It seems like we could add a case on type in:

https://github.com/nmattia/niv/blob/f73bf8d584148677b01859677a63191c31911eae/src/Niv/Cli.hs#L346-L349

and for example use nix-prefetch-git for a github type?

markus1189 avatar Jun 16 '20 08:06 markus1189

i just commented on a similar matter here: https://github.com/nmattia/niv/issues/214

The whole thing with submodules support would also be very interesting for me.

tfc avatar Jul 01 '20 13:07 tfc

(Sorry for the late reply, I was out and still catching up on notifications)

that means that niv update is a no-op on the json

No, it should still update the rev if I understand the setup correctly. It's treated as a Git repo which should work fine.

it will download the repo again (which takes a lot time)

What exactly downloads the repo?

the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.

~That's a problem, because fetchFromGitHub does not support submodules I think~ Looks like fetchFromGitHub supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?

nmattia avatar Jul 27 '20 09:07 nmattia

hey @nmattia, i wrote my comments a bit with a hot needle.

it somehow seems that builtins.fetchgit does not get stuff from the store even if it's available there. on an offline machine with everything that my repo gets from niv precached in the store (which i did by calculating the closure of a niv attrset and storing them all in an iso file), nix still tries to download stuff from the internet.

tfc avatar Jul 27 '20 09:07 tfc

(Sorry for the late reply, I was out and still catching up on notifications)

No worries!

that means that niv update is a no-op on the json

No, it should still update the rev if I understand the setup correctly. It's treated as a Git repo which should work fine. Hmm at least as far as I remember, it did not update.

it will download the repo again (which takes a lot time)

What exactly downloads the repo?

Using the sources.nix attribute of the dependency (darktable in the example above) in e.g. my NixOS config

the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.

~That's a problem, because fetchFromGitHub does not support submodules I think~ Looks like fetchFromGitHub supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?

I think I didn't put that in the right words :) Using fetchFromGitHub does work indeed, but then we would also need to change the type=git fetcher away from fetchgit or introduce another type?

markus1189 avatar Jul 28 '20 17:07 markus1189

Hey guys, quick update. I didn't drop the ball, but I opened a pretty big can of worm when I started working on https://github.com/nmattia/niv/issues/111 (implementation here: https://github.com/nmattia/niv/pull/258).

NOTE: ok this is longer than I thought but writing this down made it a bit clearer in my head. Feedback very much welcome.

I'll start with a quick recap of how niv works and what it does; then I'll give a quick overview of potential solutions.


There are two sides of niv: one is the "Nix evaluation" that's provided with sources.nix and the other one is the update, with niv update. The "Nix evaluation" tries to pick the best fetcher possible (for instance, fetchGit should be used for private repos because fetchFromGitHub just won't work (in any way practical)). The update part hits the GitHub API, pings git repos and calls nix-prefetch-url to find information about the sources like: the default branch (if none is provided), the latest revision on the branch and potentially the sha256.


Now, let's focus on git repositories (including GitHub projects). What's the best fetcher? Well, that depends on three factors: Is the repository public on GitHub? Does the repository require (SSH-)authentication for a git clone? Does the repo have submodules? Let's have a look:

note: I'll talk about fetchgit, fetchGit and fetchzip because fetchFromGitHub uses fetchgit with submodules and fetchzip without. The fetchzip variant works by downloading a tarball from GitHub.

  • The repository is hosted on GitHub, is public, and has no submodules: Any fetcher will work (fetchgit, fetchGit, fetchzip). Both fetchgit and fetchzip are good because they are fast (fixed-output derivation) and run at build-time. fetchGit will work but (1) it will regularly ping the upstream repo to check for changes and (2) will need extra settings when run inside a restrict-eval evaluation.
  • The repository is hosted on GitHub, is public, and has submodules: fetchzip is out of the question because GitHub does not offer tarballs that include submodules. fetchgit will work fine; fetchGit will work in recent versions of Nix with the same caveats as above (regular polling + eval-time considerations).
  • The repository is private: whether on GitHub or not, fetchzip won't work (without leaking the GITHUB_TOKEN which would be a pain). The fetchgit way of cloning repos is so not user friendly that I'll just say "it doesn't work". That leaves us with fetchGit; same caveats as above (regular polling + eval-time considerations) and, in case the repo has submodules, it must use a recent version of Nix.
  • The repository is public but not hosted on GitHub: Both fetchgit and fetchGit will work, but fetchzip won't (because there's no one providing a tarball). Note the caveats mentioned above for fetchGit (regular polling + eval-time considerations + recent Nix for submodules).

So basically, if your repo is public and on GitHub, fetchzip and fetchgit are best; fetchzip is a bit cleaner (just a tarball download) but for consistency fetchgit may be better (it also works if your repo has submodules). If your repo is not on GitHub but is public, use fetchgit. If your repo is private, use fetchGit but caveats (regular polling + eval-time considerations + recent Nix for submodules). Here I'd like to point out that most users just don't care about the fetchGit caveats, so maybe niv should just use fetchGit by default with an option to fallback to fetchgit for public repos.


Ok, now let's figure out how the update part of niv can figure out the latest rev and default branch. There's basically three ways: for one, you can query the GitHub API. Alternatively, you can use git ls-remote or git clone. Using git ls-remote is always preferable to git clone because it contains the info we need (latest revision and/or default branch) but doesn't involve copying any more info (super slow for big repos). So the situation is like this:

  • If the repository is a public GitHub project, then both the GitHub API and and git ls-remote will work fine; however GitHub does some rate limiting if you're not authenticated (i.e. no GITHUB_TOKEN) so let's just say that git ls-remote is better.
  • If the repository is a private GitHub project, then the GitHub API will work, but you'll need to be authenticated (i.e. have a GITHUB_TOKEN). git ls-remote will work, so... git ls-remote is preferable.
  • If the repository is not on GitHub, then... use git ls-remote.

In some cases where you need to clone the repo anyway (see below) then you might as well use git clone directly, but that's just complicating an already complicated story. So instead, let's just say niv should alwasy fetch the latest revision and default branch with git ls-remote.


Finally, in the "fetcher" section above I said that fetchgit needs a sha256. So the question is: how does one get the sha of a repo? This also relates to https://github.com/nmattia/niv/issues/111 because whenever we get the sha of a repo, we can always get the last commit date.

  • Repository is public and on GitHub and doesn't have submodules: Two solutions, using the GitHub API (downloading tarball for sha256 and querying commit info for the date) or performing a git clone. A clone can take a long, long time (try getting a shallow clone of https://github.com/torvalds/linux) so the GitHub API is preferable.
  • Repository is private and on GitHub and doesn't have submodules: Same as above, although the user needs to be authenticated (GITHUB_TOKEN) for using the GitHub API. So here the best way is probably to start the git clone and instruct the user to set a GITHUB_TOKEN if it's taking too long.
  • Repository is not on GitHub: a clone is needed here.

Basically: use the GitHub API as much as possible, but fall back to git clone when you can't. When the clone is taking too long, tell the user they could use GitHub instead (or, if only the date is needed, then do something like niv update --no-date).


There's a few other details (how does niv figure out if a repo is on GitHub, private, has submodules?) but that's more of a niv add question (then we can just store the info in sources.json). Still should spend some time thinking about it.

nmattia avatar Aug 14 '20 10:08 nmattia