[SPEC] Retain Known Peers even after they disconnect, for `N` ms
Overview
Currently when a Peer disconnect from our DKG Gossip Engine, we do cleanup the state of that peer, which contains the known messages, and other metadata. Which is good, but not until we are running a real-world and we have the networking issues.
https://github.com/webb-tools/dkg-substrate/blob/792d04551e86415e7f1d9e2d0c1efff22de33730/dkg-gadget/src/gossip_engine/network.rs#L500-L525
That would result in a clean up immediately after the peer closes the stream with us, if the same peer connect again after a few secs, they will miss any messages that got should have sent to them.
Research
What we want instead is that, first of all, we should keep the peers after they disconnect for constant period of time, before we clean up it's state. Second thing is to detect when a peer is disconnected and mark that peer as disconnected for now, and all messages that should go for that peer should be stored somewhere in a queue; so that when they connect again we send them the messages they have missed. That would make sure that they did not miss any messages.
Examples
- In the Peer information struct, we will add two things:
/// Peer information
#[derive(Debug)]
struct Peer<B: Block> {
/// Holds a set of messages known to this peer.
known_messages: LruHashSet<B::Hash>,
/// a counter of the messages that are received from this peer.
///
/// Implemented as a HashMap/LruHashMap with the message hash as the key,
/// This is used to track the frequency of the messages received from this peer.
/// If the same message is received from this peer more than
/// `MAX_DUPLICATED_MESSAGES_PER_PEER`, we will flag this peer as malicious.
message_counter: LruHashMap<B::Hash, usize>,
/// Keeps track of the Peer state.
state: PeerState,
/// Holds a queue of messages that needs to be sent to that peer, if the did miss any messages.
message_queue: VecDeque<SignedDKGMessage<AuthorityId>>,
}
enum PeerState {
Connected,
Disconnected,
}
- When we get a
Event::NotificationStreamClosedwe mark that peer state asPeerState::Disconnected, same goes forEvent::NotificationStreamOpenedif we still have that peer, we change the state to bePeerState::Connected.
fn gossip_message(&self, message: SignedDKGMessage<AuthorityId>) {
// ...
// before we call `self.service.write_notification(...)` we check the peer state,
// if they are disconnected, we enqueue the message to their message queue.
// if not, we send it normally.
//
// in a better implementation, we should use:
// https://docs.rs/sc-network/latest/sc_network/struct.NetworkService.html#method.notification_sender
}
- when we get
Event::NotificationStreamOpenedwe send out the messages in their queue, if any. - A background task that will Run every
Nms to peek into the peers, find the disconnected ones, and clean them up.
Questions/Issues
- What a good value for
N? a 5min is good?