go-libp2p icon indicating copy to clipboard operation
go-libp2p copied to clipboard

Multiple connection pruning

Open vyzo opened this issue 6 years ago • 7 comments

We have observed in the relay infrastructure that some peers open multiple connections to the relay, many of them simultaneously. This creates a problem as we have lingering sockets that simply take away resources and can potentially linger for a long time.

We currently don't have any logic to deduplicate multiple connections: we just keep them all open and use an arbitrary one (usually the first one). We need a component that selects a single connection and prunes the duplicate connections. The component can live in the swarm itself, be part of the connection manager, or be a separate independent component.

cc @raulk @Stebalien

vyzo avatar May 15 '19 08:05 vyzo

We should also consider that sometimes duplicate connections are legitimate events:

  • simultaneous connect by two peers, which would manifest as an incoming and outgoing connection at each peer
  • disconnect and immediate reconnect by the remote peer, while we still haven't detected the disconnect
  • multiple dialable addresses where available to the remote peer, which naturally dialed us multiple times.

We need to take care to select the right connections to prune.

vyzo avatar May 15 '19 09:05 vyzo

Seems that the most convenient place to implement is the connection manager, as we already have the apparatus for tracking connections and trimming logic.

vyzo avatar May 15 '19 10:05 vyzo

I propose the following heuristic strategy for deduplicating conns. The only real requirement is that the process is deterministic and two peers running it simultaneously converge to the same connection to keep.

Strategy:

  1. If we have both relayed and direct conns, keep the direct conns and drop the relay conns.
  2. If we have conns with streams and idle ones, drop the idle conns.
  3. Check for simultaneous connect: If we have both inbound and outbound conns, keep the ones initiated by the peer with the lowest peer ID.
  4. Keep the conns with the most streams and drop the other ones
  5. Break ties in the remaining conns by selecting the last conn, to match the swarm's behaviour in best connection selection.

vyzo avatar May 16 '19 11:05 vyzo

Note that maybe we should reverse 2 and 3, so that we always handle simultaneous connect in a principled manner, regardless of number of streams.

vyzo avatar May 16 '19 11:05 vyzo

@raulk @Stebalien thoughts?

vyzo avatar May 16 '19 11:05 vyzo

I'm seeing this in the wild too.

I think closing connections when they already have streams may be too late as the application has already started sending data and pulling the rug out would be disruptive.

It sounds like this be solved by stream migration, in that you would just accept the incoming connection, migrate any subsequently opened streams to whichever is considered "better" and close the "worse" one?

achingbrain avatar Jun 01 '22 16:06 achingbrain

Yep! stream migration would help in the race edge case where both peers open up connections at roughly the same time. And it has the added benefit of not needing to have both nodes agree on which one is best. The node driving the stream migration picks one (see the current draft spec: https://github.com/libp2p/specs/pull/406/files).

Are you seeing this outside of the race edge cases as well?

MarcoPolo avatar Jun 07 '22 19:06 MarcoPolo