lnd icon indicating copy to clipboard operation
lnd copied to clipboard

Handle race condition in peer connection

Open yyforyongyu opened this issue 3 years ago • 0 comments

The Issue

When Alice and Bob each makes a connection to the other side at the same time, it's likely neither connection could succeed. For instance, if Alice restarts while having a channel with Bob, the following will happen,

  1. Bob notices the connection is broken, and will attempt to connect, hence making an outbound connection. Once the attempt is made, any inbound connection will be terminated.
  2. Once Alice finishes restarting, she will make a connection to Bob, hence making an outbound connection, and any inbound connection will be terminated.
  3. Alice will cancel Bob's connection attempt since she has an outbound connection, and,
  4. Bob will cancel Alice's connection attempt since he has an outbound connection.
  5. Neither side will succeed, and will both retry after a few seconds. If unlucky, the connection could remain broken.

Here's the pruned logs taken from itest build, where Carol restarts and Bob fails the connection.

Proposed Solutions

It'd be nice to have the specs specifying who should be the initiator of the connection, or when to drop a connection. For instance, we could have,

  1. it's always the channel initiator makes the connection. This makes sense since if one side is offline, the other side won't be able to make a connection. And if both are online, channel initiator can make the connection to avoid the above race. However, this doesn't solve persistant connections, where nodes are connected manually with a channel open.
  2. when in conflict, always prefer the connection from the node with the larger pub key(lexi order). We could apply this preference to connection initiator too - whoever has the larger pub key initiates the connection request. This should solve the above connection race.

To properly fix it, we may need to flatten some of the logic used in peer connection so we can understand what's going on more easily. Ideally we'd refactor the server.go to make the peer connection its own service.

yyforyongyu avatar Aug 02 '22 11:08 yyforyongyu