gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

Support for gitremote-helpers urls (e.g., `transport::opaque-transport-uri`)

Open demosdemon opened this issue 5 months ago • 9 comments

Summary 💡

Gitoxide currently fails to parse the URIs for any remote using a gitremote helper that accepts the second form of remote URI (e.g., git-remote-codecommit).

A URL of the form <transport>::<address> explicitly instructs Git to invoke git remote-<transport> with <address> as the second argument. If such a URL is encountered directly on the command line, the first argument is <address>, and if it is encountered in a configured remote, the first argument is the name of that remote.

Ideally Gitoxide would eventually support invoking one of these helpers (https://github.com/GitoxideLabs/gitoxide/issues/1666); however, until such time, it would be nice to be able to invoke find_remote and it not bail on parsing the uri.

I can work around this by handling the err as such:

fn repo_info(local_repo: &gix::Repository) -> Result<Info, Error> {
    let default_remote = local_repo
        .remote_default_name(Fetch)
        .ok_or(Error::UnknownDefaultRemoteForFetch)?;
    trace!(?default_remote, "default remote for fetch");

    match local_repo.find_remote(&*default_remote) {
        Ok(remote) => repo_info_for_remote(default_remote, remote),
        Err(Find(Url {
            source:
                KeyError {
                    source: Some(RelativeUrl { url }),
                    ..
                },
            ..
        })) => {
            trace!(
                ?url,
                "found url potentially specifying a remote transport type"
            );

            let Some((transport, url)) = url.split_once("::") else {
                return Err(Error::MissingTransportTypeSeparator {
                    remote_name: default_remote.into_owned(),
                    url: url.to_owned(),
                });
            };

            todo!();
        }
        Err(err) => Err(Error::InvalidRemote {
            remote_name: default_remote.into_owned(),
            source: err,
        }),
    }
}

but then I lose the ability to use the Remote<'_>.


Using try_find_remote_without_url_rewrite does not help as the error from parsing the URL occurs before any rewriting:

https://github.com/GitoxideLabs/gitoxide/blob/c7af04db9b6bb1204e0f4c436d1db8f48a491e86/gix/src/repository/remote.rs#L190-L209

Motivation 🔦

I have a lot of repos using AWS CodeCommit and that service closed to new customers last year prompting me to want to migrate my repos elsewhere. And, I want to use gix in the tooling I write to do that.

demosdemon avatar Jul 06 '25 16:07 demosdemon

Thanks a lot for reporting!

It's strange that it fails to parse the URL as for me it seems to get well past this stage. Here I have changed https://… to codecommit://:

❯ gix remotes ref-map
 19:08:39 remote-refs Connecting to "codecommit://github.com/GitoxideLabs/gitoxide"
Error: Protocol Ext("codecommit") of url "codecommit://github.com/GitoxideLabs/gitoxide" is denied per configuration

After fixing this with protocol.codecommit.allow = always I get:

❯ gix remotes ref-map
 19:12:01 remote-refs Connecting to "codecommit://github.com/GitoxideLabs/gitoxide"
Error: The 'codecommit' protocol is currently unsupported

which is what happens when trying to connect, well beyond the parsing stage.

A RelativeUrlError is produced here which doesn't make much sense to me unless the original URL is also posted here.

Also, I don't know where transport::opaque-transport-uri is from or what it is supposed to mean.

Independently of that, getting remote helpers to work will be a huge topic. Unfortunately, and despite https being implemented as a helper in Git, I don't think the concept maps very well to the codebase as it stands today. From what I can tell, helpers use their own protocol, while the codebase expects to use a Transport (trait) implementation. Maybe it's possible to implement a Transport that calls a remote helper under the hood, and for all I can tell, that would be the preferred way of implementing this.

But of course, that depends very much on the capabilities of the remote helper protocol, and would require quite an analysis to be sure.

Byron avatar Jul 06 '25 17:07 Byron

It parses the first form of the git remote url where you replace the scheme with the transport name.

I'm referring to the second form as described in the gitremote-helper docs where you prepend the string <helper>:: before an opaque uri that is passed to the helper.

E.g., git-remote-codecommit urls also take the form of codecommit::us-east-1://profile@repo where git invokes git remote-codecommit us-east-1://profile@repo to handle the transport.

The second form is what does not parse.

It's an opaque URI because git places no constraints on the address when in this form.

demosdemon avatar Jul 06 '25 19:07 demosdemon

Oh, I see now, the full example was all that's needed, despite everything being described before (even in code) I simply couldn't fathom this kind of URL 😅.

And reading correctly, this issue asks not for supporting remote helpers, but only their URLs and I think this could be quite doable maybe even using the same 'trick' that was applied in the presented code, but directly in the gix-url code, along with an extra field to keep first argument. That way, gix-url::Url could also serialise itself correctly.

@rickprice might be interested to pick this up.

Byron avatar Jul 07 '25 02:07 Byron

Excellent! Sorry, I realize now I didn't provide an example url.

Yeah, actually invoking the remote handler is out-of scope especially since there is the other task that's been acknowledged.

I'm not sure how much extra overhead it would be, but something like this would be "most" correct:

enum MaybeUrl { 
    Url(gix_url::Url), 
    Opaque(BString) 
}

enum RemoteUri {
    Url(gix_url::Url),
    Helper {
        helper: BString,
        maybe_url: MaybeUrl,
    }
}

As the most common case of these kinds of URLs is just another URL. So, it would make sense to provide a pre-parsed Url struct for the common case while still providing a fallback.

demosdemon avatar Jul 07 '25 03:07 demosdemon

It's interesting that you'd be separating the opaque URL from the 'normal' one. The reason I wouldn't do that is that gix-url::Url is already supposed to be capturing all of what Git considers a descriptor to point to a repository. It already includes raw filesystem paths, as well as SCP like URLs.

So opaque URLs would just be another specialty, and I'd prefer it to handle these so downstream can keep it simple with just one way to point at a repository.

Probably I am missing something, but ideally the Url used to connect could have enough fields to help the connector to figure out what to do.

Byron avatar Jul 07 '25 03:07 Byron

Yeah, Url could be used to contain all of that. I'm more wanting to convey that anything after the :: should be considered valid and it may not actually be parsable as a Url.

E.g., hypothetical::6bf320f3-d340-4339-97be-fcb3377f7a95 would be a valid transport URI with this scheme.

demosdemon avatar Jul 07 '25 03:07 demosdemon

I see, it really is its very own thing and putting it into a URL is forcing it a bit.

Nonetheless, I guess transport can be the scheme and is a good fit. This leaves mapping opaque-transport-uri (or address like Git calls it in its docs) to something in the URL that allows everything.

From the docs

A URL of the form ::

explicitly instructs Git to invoke git remote- with
as the second argument. If such a URL is encountered directly on the command line, the first argument is
, and if it is encountered in a configured remote, the first argument is the name of that remote.

Also interesting that the remote itself can be configured to use a transport.

From the docs

Additionally, when a configured remote has remote..vcs set to , Git explicitly invokes git remote- with as the first argument. If set, the second argument is remote..url; otherwise, the second argument is omitted.

Given how flexible and special all this is, I think it's fine to keep it 'special' and massage this into gix-url::Url a bit as to not add even more complexity in the type-system and have this one type, Url, that is able to point to any repository, anywhere, with any transport or helper.

To sum it up: transport can reasonably be the scheme portion of the URL and be represented as such. The address (as per the docs) can be mapped to anything flexible, and the Url::path field would certainly do, at least as long as the URL can also be configured to serialize back to its original form that it was parsed from.

Byron avatar Jul 07 '25 04:07 Byron

Ah yes, the third form.

I have seen this used in the wild before. And, it is a bit of chaotic option. To illustrate how opaque the url is:

#!/usr/bin/env bash
# filename: /usr/local/bin/git-remote-bash

eval "$2"
$ git config remote.origin.vcs bash
$ git config remote.origin.url 'exec git-remote-https "$1" "$(/usr/local/bin/generate-git-https-url)"'

Can also be:

$ git remote set-url origin bash::'exec git-remote-https "$1" "$(/usr/local/bin/generate-git-https-url)"'

demosdemon avatar Jul 07 '25 05:07 demosdemon

That is fantastic, thank you.

I see now how ridiculously flexible this system can be, and would hope that for starters such URLs can soon be parsed. Gems like bash::'exec git-remote-https "$1" "$(/usr/local/bin/generate-git-https-url)" should definitely be part of the test-suite.

Byron avatar Jul 07 '25 11:07 Byron