nix icon indicating copy to clipboard operation
nix copied to clipboard

libfetchers/git: Allow Git Remote Helpers

Open lorenzleutgeb opened this issue 1 year ago • 8 comments

Motivation

Use Nix with repositories that are (only) accessible via remote helpers, which are programs that are invoked by Git "under the hood" based on URL schemes.

This opens up lots of possibilities to interoperate with other storage/transport mechanisms.

Example remote helpers (by URL scheme, alphabetically):

Context

The following two changes (both part of this PR) make use of such remote helpers possible in conjunction with Nix:

  1. Relax URL scheme filtering for the Git fetcher, such that unknown URL schemes are not rejected directly. In case there is no corresponding remote helper available, the following output is produced:

    warning: URL scheme 'git+invalid' is non-standard and requires 'git-remote-invalid'.
    git: 'remote-invalid' is not a git command. See 'git --help'.
    error: program 'git' failed with exit code 128
    
  2. Add GIT_DIR to the environment, since this is required by remote helpers.

As submitted (ee66ec343) this is a proof of concept, and works for me locally. I do understand that this feature might have to be protected by an experimental flag and docs would have to be added/changed. I am happy to do this if there's positive feedback.

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

lorenzleutgeb avatar Apr 19 '24 21:04 lorenzleutgeb

I want to raise the following question / thought: This is a quite general approach. Can it be expected that all remote helpers can be called without any additional command-line options? And if not, would they better be supplied when calling nix, i.e.,

nix ... --remote-helper-opts '--a-opt a-val' git+helper://...
# or
NIX_REMOTE_HELPER_OPTS=... nix git+helper://...

or via the URL to properly pair the options with the remote helper call (for nested fetches of flake inputs etc.)?

nix ... 'git+helper://...?opts=--a-opt%20a-val'

spacefrogg avatar Apr 20 '24 09:04 spacefrogg

I want to raise the following question / thought: This is a quite general approach. Can it be expected that all remote helpers can be called without any additional command-line options?

Yes. Remote Helpers are invoked by Git and the Git reference documentation on Remote Helpers states:

Remote helper programs are invoked with one or (optionally) two arguments. The first argument specifies a remote repository as in Git; it is either the name of a configured remote or a URL. The second argument specifies a URL; it is usually of the form <transport>://<address>, but any arbitrary string is possible.

Regarding

[...] or [would they better be supplied] via the URL to properly pair the options with the remote helper call (for nested fetches of flake inputs etc.)?

nix ... 'git+helper://...?opts=--a-opt%20a-val'

This is something that Remote Helpers can leverage without Git passing arguments. They control the URL scheme, so during git clone 'git+example://x?a-opt=a-val', the remote helper git-remote-example is invoked with the argument example://x?a-opt=a-val and can interpret the URL.

However, it will be tricky to get a URL with query string past Nix, since it also uses query strings to encode fetching information. With a file called git-remote-example in $PATH that just writes its $@ to a file, I was able to verify that

$ nix ... 'git+example://test?x=y&ref=main'

results in

example://test

and conversely

$ nix ... 'git+example://test%3Fx=y?ref=main'

results in

example://test%3Fx=y

So, Remote Helpers that do this, and require syntax that is otherwise parsed by Nix, will be incompatible. One workaround is the one you mentioned, somehow prefixing arguments that should not be parsed by Nix, e.g. ?pass-x=y. One loses the ability to just copy the URLs that the protocol of the remote helper requires (and maybe add ?rev or so).

IMO the conflation of fetching arguments and URL are the issue here, but that's Pandora's box. I think that there still are some interesting use cases. I am successfully using Remote Helpers with this patch.

lorenzleutgeb avatar Apr 20 '24 09:04 lorenzleutgeb

I have no decision-oriented opinion on this, but a few notes:

  • There was movement to migrate away from shelling out to git and instead use libgit2. So we should at least keep on the radar if we can port such a new feature to libgit2.

  • I like that using Git through Nix becomes more transparent and natural that way.

  • This introduces reproducibility risks, since now fetching would essentially require knowing the entire Nix setup including stuff from the environment. This change would lead to https://github.com/NixOS/nix/issues/3533 squared. And even if accepted, it would further move apart the user experience of vanilla Nix from Nix on NixOS, since you'd have to nail down more moving parts. I think we'd rather want expressions that will always work on a sufficiently modern Nix out of the box.

  • Lately, any work on fetchers for me raises the question if they should be part of Nix at all (apart from staying backward compatible). Tvix says no. Things like npins (which uses nix-prefetch-url and builtins.fetchurl underneath) and gridlock (which does not depend on Nix at all) show that we can completely decouple obtaining sources from evaluating expressions. Decoupled fetching could be provided by the distribution of Nix -- one could argue, Nixpkgs is a Nix distribution, NixOS is a Nix distribution, even Flakes are in that sense a Nix distribution.

  • Finally, what we of course should also optimise for is making Nix useful. But there are many ways to do that, such as finding and documenting convenient and scalable usage patterns. I've seen people who don't have a single builtins.fetch* in their code and manage all remote sources through git subtree.

fricklerhandwerk avatar May 21 '24 08:05 fricklerhandwerk

@fricklerhandwerk thanks a lot for thinking about this and providing your insights. Very valuable! 🙇🏻

I have no decision-oriented opinion on this, but a few notes:

  • There was movement to migrate away from shelling out to git and instead use libgit2. So we should at least keep on the radar if we can port such a new feature to libgit2.

Right. libgit2 does not support remote helpers. It looks like it did give some trouble:

  • Introduction (as you mentioned): https://github.com/NixOS/nix/pull/9240
  • Partial revert: https://github.com/NixOS/nix/pull/9806

The revert was done for a somewhat similar reason: libgit2 also does not support credential helpers. And actually, now I realize that this PR goes in the exact opposite direction of

  • #9807

This ties in quite directly to your point about moving fetchers out of Nix.

  • I like that using Git through Nix becomes more transparent and natural that way.

I fail to understand. Which way exactly do you mean by "that way"?

  • Lately, any work on fetchers for me raises the question if they should be part of Nix at all (apart from staying backward compatible). Tvix says no. Things like npins (which uses nix-prefetch-url and builtins.fetchurl underneath) and gridlock (which does not depend on Nix at all) show that we can completely decouple obtaining sources from evaluating expressions.

Yeah, I agree they probably shouldn't. I would love to write my own fetcher that can be reused by various Nix implementations. That requires an interface and spec for fetchers. And of course it introduces a bootstrapping issue (as always): How to fetch fetchers? Quoting from the Git docs on Remote Helpers:

Git comes with a "curl" family of remote helpers, that handle various transport protocols, such as git-remote-http, git-remote-https, git-remote-ftp and git-remote-ftps. They implement the capabilities fetch, option, and push.

But, not surprisingly, there already are fetchers in builtins that would have to be maintained for backward-compatibility anyway! With a nicer interface for "fetching fetchers", it might be feasible to remove the requirement that builtins.fetchgit supports credential helpers (this has obvious r13y issues anyway...). Instead, people that use credential helpers would plug in a fetcher that supports it. The "powerful" Git fetcher.

  • Finally, what we of course should also optimise for is making Nix useful. But there are many ways to do that, such as finding and documenting convenient and scalable usage patterns. I've seen people who don't have a single builtins.fetch* in their code and manage all remote sources through git subtree.

Yup, 💯. My motivation for this PR is that I wanted to have the nice UX of

$ nix run git+helper://project

where "project" is something that you cannot effectively fetch with plain Git, you need git-remote-help.

In Nix source files, or on the "repo structure" level, there's lots of room to plug in fetchers, but CLIs are much more constrained if you want something usable, i.e., that humans will ever be willing to type or copy and paste.

lorenzleutgeb avatar May 21 '24 09:05 lorenzleutgeb

I like that using Git through Nix becomes more transparent and natural that way. I fail to understand. Which way exactly do you mean by "that way"?

The way proposed in this change.

My motivation for this PR is that I wanted to have the nice UX of nix run git+helper://project where "project" is something that you cannot effectively fetch with plain Git, you need git-remote-help.

Yes, this is a totally legitimate use case to me. It also hints at that the new CLI is architecturally a different beast from everything underlying it: you can't make the CLI more convenient in the straightforward way proposed in this PR without breaking what I deem essential properties of the lower layers. For me this reinforces the idea that fetching should be part of distributions (or at least par of the porcelain) rather than the plumbing.

fricklerhandwerk avatar May 21 '24 12:05 fricklerhandwerk

@fricklerhandwerk It is ok for lower layers to bend to the requirements of the higher layers. What is the purpose of lower layers if not serving the highest layer, users? Of course we want to be careful not to make a mess, so I'm interested to know: what are the essential properties of the lower layers that you believe are violated?

roberth avatar May 21 '24 13:05 roberth

In this discussion the lower layer is the Nix language, which contains built-in fetchers. IMO an essential property of Nix expressions is derivation-/store-object-level reproducibility: given an expression and some files, you should always get the same derivations/store objects, everywhere any time. (Yes, in practice there are heaps of caveats, but that's my personal ideal.) Nix expressions are a central user interface to Nix, and major consumers are Nixpkgs and NixOS.

This particular attempt at making the other user interface, the new CLI, more convenient, would degrade reproducibility of the language layer, because it adds space for a lot more moving parts. Therefore I claim these two UIs are architecturally separate, and would suggest to treat them as such. Your suggestion to add a conditional depending on who is calling, would only keep those concerns entangled for the benefit of a smaller diff, which I think would make for a long-term liability.

Going further, I also claim that fetching file system trees and computing with file system trees are also separate concerns with very different constraints, but right now they are entangled in the big "Nix language" component. For indication that this may indeed be true check the endless stream of issues and pull requests trying to make fetching behave more like curl and git clone, and how they compete with the aspiration to keep Nix expression evaluation hermetic.

An ideal architecture would produce data flow graphs like this, where the CLI offers means to control any part of the pipeline, and things like git helpers nicely find its place:

graph TD;
    fetching[remote source] --> fso[file system object] --> expression --> derivation
    CLI --> fetching & fso & expression & derivation
    git-helpers -.-> fetching

As opposed to where we currently are, where expressions encode where to get stuff from, and where plugging git helpers will add degrees of freedom that make it hard to isolate expressions (I can't even draw the CLI interactions without making it a ball of yarn):

graph TD;
    fetching[remote source] --> fso[file system object] --> expression --> derivation
    git-helpers -.-> fetching & expression
    expression --> fetching

fricklerhandwerk avatar May 21 '24 13:05 fricklerhandwerk

@fricklerhandwerk Thank you for your elaboration.

I believe there's a lot of merit in Nix's ability to manage the fetching of expression files. For this purpose, we have to pick between

  • built-in fetchers
  • fairly benevolent import from derivation on fixed output derivations (deps will have been cached, almost always)
  • submodules and/or user scripts, neither of which scale

While I wouldn't have minded to rely on IFD/FOD fetching until the built-in fetchers were actually good, that's not the path that was chosen.

For indication that this may indeed be true check the endless stream of issues and pull requests trying to make fetching behave more like curl and git clone,

This is a problem with the process that led to Nix's reproducibility being committed to the behavior of a program, git, that wasn't properly studied, presumably because people felt familiar with it, or there wasn't sufficient review, or they were pressured into delivering. I don't know the full history, but this has caused undue reputational damage to the feature, which is very unfortunate, because it is a very good feature when executed well, specifically when it comes to performance and usability (part standardization, no reliance on derivation system which may be unknowable, and probably more; haven't memorized it all).

We still have an opportunity to fix it. It is behind a feature flag, which we can lift when it behaves well. (Which is all we can do with it, fwiw)

It seems that you're not interested in developing fetchers. Perhaps you'd prefer to delegate this?

(Also I'm not sure if it's fair to "hijack" a PR to have a radical architectural conversation. How do we feel about this?)

roberth avatar May 21 '24 16:05 roberth

Hey, has there been any further discussion about remote helpers / custom fetchers somewhere (since it was briefly mentioned during the conversation here)? I'm currently stuck in a situation where my use case requires that I have access to a feature like this, so I would like to offer my help in any shape or form which might be helpful to get a feature like this out the door. :)

Popax21 avatar Nov 20 '24 22:11 Popax21

This feature would be valuable for supporting modern Git workflows and alternative hosting solutions like Radicle. While I understand the desire to get the architecture right first, I wonder if we could take an incremental approach by adding remote helper support behind an experimental flag (similar to other Nix features). This would allow the community to experiment with and validate real-world use cases, while the longer-term architectural work continues.

My specific use case involves using private Radicle repositories as flake inputs in my NixOS configurations. Currently this does not work, but with remote helper support it could work seamlessly with the standard Nix tooling. This would make it easier to use Nix with newer decentralized source control solutions.

blurgyy avatar Jan 23 '25 05:01 blurgyy