js-libp2p icon indicating copy to clipboard operation
js-libp2p copied to clipboard

fix(@libp2p/tcp): race condition in onSocket

Open tabcat opened this issue 1 year ago • 3 comments

Title

fix(@libp2p/tcp): race condition in onSocket

Description

This fixes a race condition in the onSocket listener method. onSocket gets a net.Socket parameter that needs to be closed later before the tcp server can close.

The net.Socket is used to create a MultiaddrConnection but the connection is not tracked with this.connections until after it has been upgraded.

If the tcp listener.close is called while a the MultiaddrConnection is waiting to be upgraded, then the MultiaddrConnection socket cannot be closed as it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on creation I decided to create a separate property named this.maConnections. I decided to track the non-upgraded connections separately because I was unsure how this.connections must be used with other things like metrics.

Notes & open questions

Closes #2760

Change checklist

  • [ ] I have performed a self-review of my own code
  • [ ] I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • [ ] I have added tests that prove my fix is effective or that my feature works

tabcat avatar Oct 11 '24 12:10 tabcat

Actually server.close says:

Stops the server from accepting new connections and keeps existing connections.

So yes, it is necessary to close un-upgraded connections separately, but we should probably just close them immediately if they fail to upgrade, if we're not doing that already.

achingbrain avatar Oct 11 '24 14:10 achingbrain

Yes, they are aborted inside of the catch block of the upgrade call currently.

The problem is that maConn is added to this.connections inside of the then block of the upgrade. If close is called before the then block is ran then the connection cannot be closed and this.server.close never resolves.

tabcat avatar Oct 11 '24 16:10 tabcat

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

I can add a test. The test that I have been using exists in https://github.com/tabcat/delay5 which I should be able to replicate in a separate PR.

tabcat avatar Oct 11 '24 16:10 tabcat

Sorry for jumping in, I thought I'd just drag this across the line.

This has shown up a slightly different problem in that we currently close the socket server but wait for the "close" event via the callback passed to this.server.close - the "close" event is only emitted once all connections have closed - if the connections don't close the promise will never resolve which could be what's causing https://github.com/ChainSafe/lodestar/issues/6053

The callback was added in https://github.com/libp2p/js-libp2p/pull/2058 it was not in the original PR that added the "pause" functionality - https://github.com/libp2p/js-libp2p-tcp/pull/218

achingbrain avatar Oct 23 '24 11:10 achingbrain