go-libp2p icon indicating copy to clipboard operation
go-libp2p copied to clipboard

Connect should wait for an acceptable connection

Open Stebalien opened this issue 3 years ago • 2 comments

Currently, it waits for any connection, then returns. Unfortunately, this means that if the user calls host.NewStream(peer) to try to get a new stream to a non-connected peer, we'll fail every single time if we try to create a stream behind a NAT.

The difficulty here is that we aren't "in control", and don't know if the remote peer will actually directly connect to us. It's up to the receiver to establish a direct connection.

I think the solution here is to wait a bit after we establish a transient connection to see if a non-transient connection comes through. The tricky part will be actually implementing that...

Stebalien avatar Jun 17 '22 23:06 Stebalien

Notes after syncing with @Stebalien:

  • The dial worker loop currently only cares about addresses we’re dialing, not incoming connections. Hole punched connections are incoming connections though, so we might end up waiting for any dial to finish, while we already have a perfectly usable direct connection.
  • The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an Add(ma.Multiaddr) function that joins a new dial job, and then Get() <-chan network.Conn function that returns a channel with all successfully established functions (maybe pass a filter function to Get).

marten-seemann avatar Jun 20 '22 19:06 marten-seemann

@marten-seemann : should this be opened or closed? I wasn't sure given the PR was merged but it was reopened.

BigLep avatar Aug 26 '22 22:08 BigLep

@marten-seemann @p-shahi : is this still an issue? (I'm asking because I'm trying to figure out if can close https://github.com/ipfs/kubo/issues/9751 ). Thanks.

BigLep avatar May 11 '23 15:05 BigLep

Yes, we haven't fixed this yet.

marten-seemann avatar May 15 '23 13:05 marten-seemann

Gentle bump, just for the record https://github.com/ipfs/kubo/issues/9751 is still a problem.

lidel avatar Jul 24 '23 13:07 lidel

I'll submit a repro later this week.

Jorropo avatar Jul 24 '23 15:07 Jorropo

The PR I've raised changes host.NewStream and swarm.NewStream to wait for a direct connection.

Why to do this in NewStream as opposed to DialPeer.

  • I don't want to change the current behaviour of DialPeer returning a relayed connection when network.WithUseTransient is not used. Maybe this the intention of the implementation was to not return a relayed connection but this behaviour has been there for a while and we'll end up subtly breaking those clients.
  • This PR with #2547 provides us with reasonable behaviour of DialPeer depending on network.ForceDirectDial and NewStream depending on network.WithUseTransient.
  • I want to avoid modifying the dial worker loop to wait for direct connections. In future I see us changing the dial worker loop to take a list of address as input and providing a connection on one of those addresses. This would help us provide an option like this https://github.com/libp2p/go-libp2p/issues/2412. This is specifically for not modifying the worker loop as suggested here

The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an Add(ma.Multiaddr) function that joins a new dial job, and then Get() <-chan network.Conn function that returns a channel with all successfully established functions (maybe pass a filter function to Get).

sukunrt avatar Sep 01 '23 09:09 sukunrt

Need to merge https://github.com/libp2p/go-libp2p/pull/2542 to close this.

sukunrt avatar Sep 18 '23 15:09 sukunrt