gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

Non-additive transport features are annoying to deal with

Open djc opened this issue 3 months ago • 16 comments

Current behavior 😯

Cargo features are intended to be additive and non-additive features are making it substantially harder to deal with the gix crates. It would be very nice if the curl and reqwest backends could be rewired so that the caller must pick a backend and the crates can still be compiled if both are enabled.

(Noticed this again while working on https://github.com/rustsec/rustsec/pull/1408.)

Expected behavior 🤔

Compile with --all-features without this:

error[E0428]: the name `Impl` is defined multiple times
   --> /home/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/gix-transport-0.48.0/src/client/blocking_io/http/mod.rs:220:1
    |
217 | pub type Impl = curl::Curl;
    | --------------------------- previous definition of the type `Impl` here
...
220 | pub type Impl = reqwest::Remote;
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `Impl` redefined here
    |
    = note: `Impl` must be defined only once in the type namespace of this module

error: Cannot set both 'http-client-reqwest' and 'http-client-curl' features as they are mutually exclusive
  --> /home/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/gix-transport-0.48.0/src/client/blocking_io/http/mod.rs:26:1
   |
26 | compile_error!("Cannot set both 'http-client-reqwest' and 'http-client-curl' features as they are mutually exclusive");
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0428`.

djc avatar Sep 05 '25 09:09 djc

Thanks for reporting, and putting it onto the map.

When implementing it like this, mutually exclusive features that should be additive, I didn't have a solution as the goals are competing. How can a rewiring be done given such constraints?

If no feature is selected, that's fine I think as there doesn't have to be support for transports at all (and the local transport will always work if I remember correctly). So what's left is to decide which feature takes precedence if multiple are selected, but that looks like a footgun that is rather be avoided: ambiguity, while silently making a choice for the user.

Right now, the constraint is violated to not have to make such a choice.

I'd love it if we could come up with a solution that is better for most users, and then implement it. Something that would help is the workflow that led to it being substantially harder to deal with gix crates.

Byron avatar Sep 06 '25 03:09 Byron

I'd suggest having an enum Transport { Curl, Reqwest } (guarding the variants of the enum with Cargo features), a value of which must be passed into the API when constructing a repository or doing something else that requires a transport.

djc avatar Sep 08 '25 09:09 djc

Won't it be better to a dyn trait object instead?

Then we could share apply layers for retrying/cookie/proxy etc, reuse reqwest client with async code

And that would allow arbitrary http client impl

AFAIK lowlevel API of gix already uses trait for http client, using trait object would also avoid generic instantiation

NobodyXu avatar Sep 08 '25 11:09 NobodyXu

That should also work.

djc avatar Sep 08 '25 11:09 djc

In the end it comes down to a Transport implementation which, combined with a Remote becomes a connection. The linked method already takes a trait object.

This kind of setting also needs to be per remote, as these hold the URLs of resources to connect to, for which gix-transport hold a universal connect() function that helps to connect to all the known resources out there.

Of course, this is also the crate that ultimately receives a bunch of non-additive cargo features.

Now that I think of it - the only way this can be solved if these are made additive, and maybe there can be some enum selector that mirrors these features to disambiguate the ambiguous ones. Oh, and I seem to remember now that this was also about async vs sync, and the Transport trait turns async in case an async HTTP implementation is selected, making a sync implementation impossible.

I don't see a simple solution there, unless of course Rust gets better at async one day. The reason for the code being as it is is the need to re-use code and tests, otherwise I'd have to duplicate everything which isn't maintainable.

Byron avatar Sep 08 '25 12:09 Byron

Oh, and I seem to remember now that this was also about async vs sync, and the Transport trait turns async in case an async HTTP implementation is selected, making a sync implementation impossible.

Given that majority of gix is blocking as it deals with disk I/O, would it makes sense to have the transport trait wraps all possibly async methods into blocking methods that is suitable for internals of gix?

Edit:

Of course in the future gix might utilise io-uring and thus pickup some async code, though io-uring doesn't necessarily need async and it'd be easy to change them to async

NobodyXu avatar Sep 08 '25 12:09 NobodyXu

TBH, doing this would be ideal and it would be great if that would be possible. The downstream would have to decide that though as the async versions of the traits are in production for all I know.

It's the fine folks at Radicle, I just don't feel like I want to deal with this just now. Another, the current, strategy, is to sit it out and hope either non-additive features in Cargo to land, or an async implementation that can interop with Sync code more nicely. Maybe there could be Transport implementations for both Sync and async then, so it even curl would be available through the async version of the Transport trait, or vice-versa with reqwest from the sync version of it. Something like this might be feasible already with the maybe_async macro that turns async code into sync code by feature toggle. For curl this could work I think, but not for request which needs some blocking invocation for each actual future, so I guess that's where that idea turns out to be infeasible.

In any case, it doesn't look like there is a straightforward solution beyond ditching async, something I'd love to do. Of course, there was always this dream of implementing a server that is async for receiving various packs at once, building the data structure to index the pack, but thinking about it, it will be much safer to not do that work within the server process and instead have a process per connection. So… not having async would certainly be preferable here.

Proposed solution 1

  • ditch async
  • make existing features additive
  • allow making a choice if a scheme can be handled by multiple implementations
    • here, I like having non-additive features as well as they'd prevent putting more into a binary than is needed, see Proposed Solution 2

Proposed solution 2

  • wait until there are non-additive Cargo features and use them
    • the idea here would probably be that the application can set features further down the tree, or resolve ambiguity themselves.

Interestingly, I feel like PS1 and PS2 together would make for the best experience, and far simpler code in gix-transport.

Byron avatar Sep 08 '25 13:09 Byron

TBH, doing this would be ideal and it would be great if that would be possible. The downstream would have to decide that though as the async versions of the traits are in production for all I know.

It's the fine folks at Radicle, I just don't feel like I want to deal with this just now. Another, the current, strategy, is to sit it out and hope either non-additive features in Cargo to land, or an async implementation that can interop with Sync code more nicely. Maybe there could be Transport implementations for both Sync and async then, so it even curl would be available through the async version of the Transport trait, or vice-versa with reqwest from the sync version of it.

I'm not sure I understand the requirements here. TBH for an idiomatic Rust implementation it seems backwards to avoid async code, and it seems pretty unlikely we're going to get non-additive Cargo features anytime soon.

djc avatar Sep 08 '25 13:09 djc

I feel like you could achieve both sync git client and async git server by doing:

  • sans-io low level transport, having a sans-io state machine is clearly easier to test and can work with async/sync without feature burdens
  • for client code, all blocking, though not sure if gix would adopt io-uring in the future if disk i/o become bottleneck
  • for server code, I think io-uring would be a go-to for any project that want high-performance disk I/O, so gix-server might end up using tokio-uring stuff

NobodyXu avatar Sep 08 '25 14:09 NobodyXu

Hmm just realize that with io-uring, maybe all disk I/O will become async as it really provides exceptional performance, though it doesn't have to be in async {} block to utilise

NobodyXu avatar Sep 08 '25 14:09 NobodyXu

@djc It definitely is also me who doesn't like async and the Pin handling whenever one has to implement a trait. It's just not nice compared to Sync code, so I don't like it. Also, I don't like to maintain two implementations, or two copies of tests, so maybe_async is crucial to keep tests deduplicated. The Transport trait is also needed to taper over opt-in async, but the details I already forgot. Another note about async: I don't think it's useful in compute-heavy code, or outside shuffling bytes around during IO. I'd also like to use it in application code, particularly anything with user interactions, but that doesn't apply to gitoxide. Async has been quite useless here and it's there because it was sponsored, and to me is just added complexity. Maybe there could have been an implementation of async that would make this the default, but I don't think that's the case for libraries like this one. This was mentioned just to provide insight, not to affect the way you think about it.

@NobodyXu Doing sans-io is a nice idea and I think it's done here at the core, but since this can't be used on the call-site, I don't think it can ever protect fully from dealing with the async/sync world split. Maybe improvements can be made in that regard, I wouldn't think I managed to do this optimally here, but I also wouldn't expect wonders.

Byron avatar Sep 09 '25 07:09 Byron

Maybe you could expand the Transport trait to cover every network-related operation gix ever made, and so make sure the Transport trait is always sync, thus removing the need for maybe_async?

So async code would only exists within the implementation of Transport trait but not outside of it, only one version of the Transport trait regardless of which feature is enabled

NobodyXu avatar Sep 09 '25 08:09 NobodyXu

That would basically be the proposed solution 1: ditch async, and async impls could be used under the hood. It's all possible, but I think this must be driven by someone or something. Removing async right now also isn't an option, after all there is still one at least one user who is also the one who sponsored it.

Byron avatar Sep 09 '25 15:09 Byron

If I may just add my opinion here: Go for sans I/O as much as possible, pushing I/O as much outwards as you can. Potentially even have a separate crate (gix-transport-proto? confusion with gix-protocol?) that models transports which is sans I/O (just works on buffers in memory), and separate crates to do I/O. You would potentially have one crate that provides interfaces for async I/O and another one for sync I/O. Now of course you have only shifted the problem to the next dependent of gix-transport, so you recursively repeat this process. There's a chance that this might clear up the picture. There are a few other projects that do this, for example the QUIC implementation quinn. The hope would be that you only have to select sync vs. async at the very outer layer, in the binary crate, i.e. application (a Git client or server).

And then regarding these:

@NobodyXu wrote in https://github.com/GitoxideLabs/gitoxide/issues/2152#issuecomment-3266122352:

Given that majority of gix is blocking as it deals with disk I/O, would it makes sense to have the transport trait wraps all possibly async methods into blocking methods that is suitable for internals of gix?

Of course in the future gix might utilise io-uring and thus pickup some async code, though io-uring doesn't necessarily need async and it'd be easy to change them to async

@NobodyXu wrote in https://github.com/GitoxideLabs/gitoxide/issues/2152#issuecomment-3266690915:

Hmm just realize that with io-uring, maybe all disk I/O will become async as it really provides exceptional performance, though it doesn't have to be in async {} block to utilise

The further out you can push I/O, the less this is a concern. However, you're in trouble if you assume that some particular application/use-case will prefer one kind of I/O over the other. Just don't do that. Let the application decide.

lorenzleutgeb avatar Oct 12 '25 12:10 lorenzleutgeb

That sounds very reasonable and I'd love to get a chance to clean up this age-old tech-debt. Internally there already is a clear separation, so it's more of a refactoring task (which of course has its own issues I am sure, but it should work).

Would have to do some analysis to see how it would look like if the choice of async/async comes with the crate that is pulled in, instead of a feature toggle.

Byron avatar Oct 12 '25 12:10 Byron

Just for completeness, here is another comment, which is reproduced here for convenience:

In the gitoxide Cargo workspace, there are a number of features that violate the requirement that feature flags must be additive.

There is #2152 about gix-transport, but there are others (rg --context=5 --iglob '**/Cargo.toml' '(exclusive|mutual)'). Particularly worrying is the [features] section of gix/Cargo.toml, the main entrypoint crate gix.

As you can see, mutually exclusive features proliferate along the dependency tree. A crate that depends on gix with its mutually exclusive features, and on another crate with mutually exclusive features, will see an even greater explosion in complexity handling feature flags.

Please be a good citizen and make all features additive. In particular I urge you to resolve this before releasing a crate at version 1 with mutually exclusive features.

The intent of this issue is to track the whole workspace/repo.

Byron avatar Oct 15 '25 06:10 Byron