go-libp2p icon indicating copy to clipboard operation
go-libp2p copied to clipboard

"disconnect" protocol/message

Open whyrusleeping opened this issue 8 years ago • 15 comments

As pointed out by @Kubuxu, it would be really useful to have a way for nodes to tell eachother to disconnect. This way nodes don't assume it was an unintentional disconnection and immediately try to reconnect, among other scenarios.

This could use some discussion.

cc @Kubuxu @vyzo @diasdavid @lgierth @dignifiedquire @magik6k @Stebalien

whyrusleeping avatar Oct 15 '17 14:10 whyrusleeping

quick thought: this could be made to be part of the identify protocol.

whyrusleeping avatar Oct 15 '17 14:10 whyrusleeping

Maybe?

Where do we reconnect to peers on disconnect? We obviously do it but I don't think we're doing it intentionally. I think this is mostly a symptom of how we track connections (we re-dial disconnected peers before we realize we're no longer connected to them). Ideally, sending a FIN packet (for TCP at least) would be enough (don't have to wait).

Now, it might be useful to have a "forget me" message but, IMO, that's more useful when shutting down and moving (to invalidate old peerinfo records).

Stebalien avatar Oct 17 '17 01:10 Stebalien

I think this would be very useful to ensure that peers understand the difference between a connection drop and an intentional close from the other side.

I don't think we should rely on the transport having FIN packets (as not all of them do). But I could see a scenario where those are used to implement this if available.

dignifiedquire avatar Oct 17 '17 10:10 dignifiedquire

The benefit of a higher-level message (vs a FIN packet) is that we can model a "disconnect reason", useful for debugging and future extensibility. For example, the devp2p stack in Ethereum models this enum:

0x00 Disconnect requested;
0x01 TCP sub-system error;
0x02 Breach of protocol, e.g. a malformed message, bad RLP, incorrect magic number &c.;
0x03 Useless peer;
0x04 Too many peers;
0x05 Already connected;
0x06 Incompatible P2P protocol version;
0x07 Null node identity received - this is automatically invalid;
0x08 Client quitting;
0x09 Unexpected identity (i.e. a different identity to a previous connection/what a trusted peer told us).
0x0a Identity is the same as this node (i.e. connected to itself);
0x0b Timeout on receiving a message (i.e. nothing received since sending last ping);
0x10 Some other reason specific to a subprotocol.

It's also useful for higher level aspects, like reputation, peer scoring, quality of service, etc. Peers will want to know if they're being disconnected because their reputation has decreased, or due to packet loss on unreliable transports (e.g. UDP/QUIC), etc.

raulk avatar Sep 20 '18 22:09 raulk

It seems odd to place these control messages in a protocol called "identify". Other applications call this "wire protocol" or "control protocol". Maybe a more generic protocol could subsume identify?

raulk avatar Sep 20 '18 22:09 raulk

Yeah, it doesnt need to be part of identify. Was just thinking it would be convenient to put it there since identify is required to be in the loop for every connection, and could help in making decisions about allowing new incoming connections.

whyrusleeping avatar Sep 20 '18 22:09 whyrusleeping

(email backlog)

So, I've been leaning more and more towards many tiny protocols. Really, I'd just define an entirely new connection "management" protocol.

Stebalien avatar Nov 29 '18 01:11 Stebalien

@Stebalien those feel juxtaposed. Many tiny protocols != one (coarse-grained) connection management protocol, right? May boil down to terminology, though.

Medium-term, I do see a /p2p/wire/nnn protocol that subsumes:

  • existing protocols: IDENTIFY, PING.
  • new messages:
    • DISCONNECT: "I'm closing the conn, and this is why".
    • GOAWAY: warning that peer A is about to close a connection with peer B.
    • GOAWAY_ACK: peer B telling peer A that closure is OK, or pleading to keep running.
    • some kind of STATUS/STATS message that serves as a basis for adaptable networks and load-balancing.

That way a single stream would be responsible for all connection control messages.

Some features in the wire protocol may be optional, e.g. STATS. IDENTIFY could advertise a map of optional features supported by the peer.

FWIW, this series of requirements is captured in the libp2p roadmap under the Polite peering and wire protocol goal.

raulk avatar Nov 29 '18 10:11 raulk

those feel juxtaposed.

By "connection management", I meant connman stuff. That is, telling users that we're overloaded, telling users to disconnect, etc. Although, really, we could just have one per message

That way a single stream would be responsible for all connection control messages.

So, this is important for multisteam 1.0 but not for multistream 2.0. That is, there's no reason to keep a stream open in mutlistream 2.0 (unless you just like wasting memory).

Some features in the wire protocol may be optional, e.g. STATS. IDENTIFY could advertise a map of optional features supported by the peer.

This is really why I'd like to push smaller endpoints. Instead of advertising feature maps for protocols, we can just have one "feature" per endpoint. Basically, think of every endpoint as a function. Instead of advertising a function do_something(something, argument) and then having to advertise a list of supported values for something, we can just advertise do_x, do_y, do_z. In libp2p land, these would be /my-protocol/x, /my-protocol/y, /my-protocol/z.

Basically, I'm trying to reduce the amount of state we track per peer per service; it's annoying to manage and takes memory/cpu time. Currently:

  1. On connect, (almost) every service says hello and then keeps some state to track what features the peer supports. This means every service has to implement some hello protocol, every service needs to register a connection (not just a stream) listener, etc.
  2. On disconnect, every service must atomically forget this state (to avoid leaking memory). The service also has to make sure that it doesn't forget the state if the peer ended up reconnecting before it got around to handling the disconnect event.
  3. All this per-peer state adds up. With UDP based protocols, we should be able to keep open a ton of connections (not limited by file descriptors). I'd like to keep per-peer state to a minimum.

Stebalien avatar Nov 29 '18 23:11 Stebalien

IMO we will need a control stream, and it should be the swarm that opens that stream before handing over the connection to the libp2p stack.

Reason: by introducing connection gating the swarm can decide to immediately close the connection. If we don't open a control stream, we have no way of informing the other party of the reason for that disconnection (e.g. "over capacity").

This would also solve the whole "identify is async but a crucial part of everything else", which has bit us in the butt many times.

At this point I would prefer if we don't think of HELLO and GOODBYE messages as separate sharded protocols, but rather two messages on a control/wire protocol.

raulk avatar Dec 11 '19 11:12 raulk

I like the suggested control protocol as part of libp2p for all the reasons presented in this thread. It would also create a convenient space for any future control messages to be added. Identify can become just one of its control messages.

daviddias avatar Dec 11 '19 12:12 daviddias

please cover https://github.com/libp2p/go-libp2p/issues/374 in the control protocol if possible.

jkassis avatar Aug 05 '21 22:08 jkassis

That's an unrelated issue, unfortunately. This issue is about introducing a disconnect message to politely ask another node to disconnect from us (see the motivation).

The other issue is a bug somewhere.

Stebalien avatar Aug 06 '21 01:08 Stebalien

Huh, how did this issue get closed?

Looks like it happened when I merged go-libp2p-quic-transport into this repo (https://github.com/libp2p/go-libp2p/pull/1424), and GitHub applied the comment referring to the fix of https://github.com/libp2p/go-libp2p-quic-transport/issues/238 to this repo. This is bad. I wonder how many other issues got closed that way...

marten-seemann avatar Apr 25 '22 09:04 marten-seemann

I wonder how many other issues got closed that way...

As far as I could tell, none.

Stebalien avatar Apr 25 '22 10:04 Stebalien