node2nix icon indicating copy to clipboard operation
node2nix copied to clipboard

Avoid cloning dependencies full repo if possible

Open martinetd opened this issue 2 years ago • 1 comments

hi!

cryptpad has recently removed its bower dependency and I tried node2nix on it, and it looks like it generates things as expected. Thanks for this great tool!

One of the dependency is written as a git repo (cryptpad/drawio-npm#npm with full url in lock file) and node2nix will do a full git clone for it.

That repo is fairly big (1.3GB download according to the git clone log message), and just cloning with --depth=1 reduces the download to 56MB (or 50MB for a tar.gz archive) so it's not like it's all useful as we basically just clone to compute the sha256 hash of the dependency (nix-hash on the dir after cleaning up .git etc)?

In practice without package lock we could just clone with depth=1 to get the latest commit as I think that's how the package.json format works; if there's a lock we need to get a specific commit. There's a couple of ways to make this work:

  • if we assume github here (for projects owner/repo automatically expanded to github) then one can use https://github.com/owner/repo/archive/[commit].tar.gz -- this is probably the easiest to implement, but it's a bit sad to only implement github...
  • I wasn't able to make git clone work with a specific commit, but fetch will happily work: mkdir <dir> && git -C <dir> init && git -C <dir> fetch --depth=1 <repo> <commit> && git -C <dir> checkout FETCH_HEAD will "clone" an individual commit, or as well as symbolic versions e.g. commit can be a full rev but refs/pull/4779/head would also work. I remembered this would only work with advertised commits (e.g. tip of a branch or tag) before, but this doesn't seem to be the case anymore, and it also worked for me on other repos I tried off other platforms (tried cgit, gitlab). In case of failure we probably want to fall back to a full clone... A bit annoying.

Hmm now I'm looking there's also submodules to cater for; the later clone's command will get a git directory so submodule update --init --recursive --depth=1 should also work just fine, and only get the latest commit.

What do you think?

I've had a quick look at the code and I don't see any huge blocker as long we can untangle the commit that wants to be fetched, but I'm really not familiar with how node works so a sanity check would be appreciated.

This isn't a priority as everything works, but if I could avoid downloading over 1GB of data everytime I want to update cryptpad, it'll be appreciated in the long run :)

martinetd avatar Aug 19 '23 12:08 martinetd

Nevermind, I ran into a bug after this (it complains cache mode is 'only-if-cached' but no cached response is available which looks like a problem with the package.json/lock file not being up to date?) and was too lazy to look further -- I switched to buildNpmPackage as that seems to work out, so I won't be looking into this further. (... Just have to look at how to configure this thing again now...)

Thanks again / sorry for the noise; the idea of this issue is still valid so I'm leaving this open, but I no longer have any incentive to actually do it now so won't be following up. Feel free to close anytime if not interested.

martinetd avatar Aug 20 '23 11:08 martinetd