go-libp2p
go-libp2p copied to clipboard
Connect should wait for an acceptable connection
Currently, it waits for any connection, then returns. Unfortunately, this means that if the user calls host.NewStream(peer) to try to get a new stream to a non-connected peer, we'll fail every single time if we try to create a stream behind a NAT.
The difficulty here is that we aren't "in control", and don't know if the remote peer will actually directly connect to us. It's up to the receiver to establish a direct connection.
I think the solution here is to wait a bit after we establish a transient connection to see if a non-transient connection comes through. The tricky part will be actually implementing that...
Notes after syncing with @Stebalien:
- The dial worker loop currently only cares about addresses we’re dialing, not incoming connections. Hole punched connections are incoming connections though, so we might end up waiting for any dial to finish, while we already have a perfectly usable direct connection.
- The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an
Add(ma.Multiaddr)function that joins a new dial job, and thenGet() <-chan network.Connfunction that returns a channel with all successfully established functions (maybe pass a filter function toGet).
@marten-seemann : should this be opened or closed? I wasn't sure given the PR was merged but it was reopened.
@marten-seemann @p-shahi : is this still an issue? (I'm asking because I'm trying to figure out if can close https://github.com/ipfs/kubo/issues/9751 ). Thanks.
Yes, we haven't fixed this yet.
Gentle bump, just for the record https://github.com/ipfs/kubo/issues/9751 is still a problem.
I'll submit a repro later this week.
The PR I've raised changes host.NewStream and swarm.NewStream to wait for a direct connection.
Why to do this in NewStream as opposed to DialPeer.
- I don't want to change the current behaviour of DialPeer returning a relayed connection when
network.WithUseTransientis not used. Maybe this the intention of the implementation was to not return a relayed connection but this behaviour has been there for a while and we'll end up subtly breaking those clients. - This PR with #2547 provides us with reasonable behaviour of DialPeer depending on
network.ForceDirectDialand NewStream depending onnetwork.WithUseTransient. - I want to avoid modifying the dial worker loop to wait for direct connections. In future I see us changing the dial worker loop to take a list of address as input and providing a connection on one of those addresses. This would help us provide an option like this https://github.com/libp2p/go-libp2p/issues/2412. This is specifically for not modifying the worker loop as suggested here
The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an Add(ma.Multiaddr) function that joins a new dial job, and then Get() <-chan network.Conn function that returns a channel with all successfully established functions (maybe pass a filter function to Get).
Need to merge https://github.com/libp2p/go-libp2p/pull/2542 to close this.