js-libp2p
js-libp2p copied to clipboard
fix(@libp2p/tcp): race condition in onSocket
Title
fix(@libp2p/tcp): race condition in onSocket
Description
This fixes a race condition in the onSocket listener method. onSocket gets a net.Socket parameter that needs to be closed later before the tcp server can close.
The net.Socket is used to create a MultiaddrConnection but the connection is not tracked with this.connections until after it has been upgraded.
If the tcp listener.close is called while a the MultiaddrConnection is waiting to be upgraded, then the MultiaddrConnection socket cannot be closed as it does not exist in this.connections.
Instead of adding the MultiaddrConnection to this.connections on creation I decided to create a separate property named this.maConnections. I decided to track the non-upgraded connections separately because I was unsure how this.connections must be used with other things like metrics.
Notes & open questions
Closes #2760
Change checklist
- [ ] I have performed a self-review of my own code
- [ ] I have made corresponding changes to the documentation if necessary (this includes comments as well)
- [ ] I have added tests that prove my fix is effective or that my feature works
Actually server.close says:
Stops the server from accepting new connections and keeps existing connections.
So yes, it is necessary to close un-upgraded connections separately, but we should probably just close them immediately if they fail to upgrade, if we're not doing that already.
Yes, they are aborted inside of the catch block of the upgrade call currently.
The problem is that maConn is added to this.connections inside of the then block of the upgrade. If close is called before the then block is ran then the connection cannot be closed and this.server.close never resolves.
It's possible there's something missing though - could you please add a test to show the problem you're seeing?
I can add a test. The test that I have been using exists in https://github.com/tabcat/delay5 which I should be able to replicate in a separate PR.
Sorry for jumping in, I thought I'd just drag this across the line.
This has shown up a slightly different problem in that we currently close the socket server but wait for the "close" event via the callback passed to this.server.close - the "close" event is only emitted once all connections have closed - if the connections don't close the promise will never resolve which could be what's causing https://github.com/ChainSafe/lodestar/issues/6053
The callback was added in https://github.com/libp2p/js-libp2p/pull/2058 it was not in the original PR that added the "pause" functionality - https://github.com/libp2p/js-libp2p-tcp/pull/218