dkg-substrate icon indicating copy to clipboard operation
dkg-substrate copied to clipboard

[SPEC] Retain Known Peers even after they disconnect, for `N` ms

Open shekohex opened this issue 3 years ago • 0 comments

Overview

Currently when a Peer disconnect from our DKG Gossip Engine, we do cleanup the state of that peer, which contains the known messages, and other metadata. Which is good, but not until we are running a real-world and we have the networking issues.

https://github.com/webb-tools/dkg-substrate/blob/792d04551e86415e7f1d9e2d0c1efff22de33730/dkg-gadget/src/gossip_engine/network.rs#L500-L525

That would result in a clean up immediately after the peer closes the stream with us, if the same peer connect again after a few secs, they will miss any messages that got should have sent to them.

Research

What we want instead is that, first of all, we should keep the peers after they disconnect for constant period of time, before we clean up it's state. Second thing is to detect when a peer is disconnected and mark that peer as disconnected for now, and all messages that should go for that peer should be stored somewhere in a queue; so that when they connect again we send them the messages they have missed. That would make sure that they did not miss any messages.

Examples

  1. In the Peer information struct, we will add two things:
/// Peer information
#[derive(Debug)]
struct Peer<B: Block> {
	/// Holds a set of messages known to this peer.
	known_messages: LruHashSet<B::Hash>,
	/// a counter of the messages that are received from this peer.
	///
	/// Implemented as a HashMap/LruHashMap with the message hash as the key,
	/// This is used to track the frequency of the messages received from this peer.
	/// If the same message is received from this peer more than
	/// `MAX_DUPLICATED_MESSAGES_PER_PEER`, we will flag this peer as malicious.
	message_counter: LruHashMap<B::Hash, usize>,

	/// Keeps track of the Peer state.
	state: PeerState,
   /// Holds a queue of messages that needs to be sent to that peer, if the did miss any messages.
	message_queue: VecDeque<SignedDKGMessage<AuthorityId>>,
}

enum PeerState {
	Connected,
	Disconnected,
}
  1. When we get a Event::NotificationStreamClosed we mark that peer state as PeerState::Disconnected, same goes for Event::NotificationStreamOpened if we still have that peer, we change the state to be PeerState::Connected.
fn gossip_message(&self, message: SignedDKGMessage<AuthorityId>) {
	// ...
	// before we call `self.service.write_notification(...)` we check the peer state,
	// if they are disconnected, we enqueue the message to their message queue.
	// if not, we send it normally.
	//
	// in a better implementation, we should use:
	// https://docs.rs/sc-network/latest/sc_network/struct.NetworkService.html#method.notification_sender  
}
  1. when we get Event::NotificationStreamOpened we send out the messages in their queue, if any.
  2. A background task that will Run every N ms to peek into the peers, find the disconnected ones, and clean them up.

Questions/Issues

  1. What a good value for N? a 5min is good?

shekohex avatar Dec 07 '22 13:12 shekohex