dkg-substrate icon indicating copy to clipboard operation
dkg-substrate copied to clipboard

[SPEC] Improve our Network Engine to be more resilient and intelligent about message passing

Open shekohex opened this issue 2 years ago • 2 comments

Overview

As of now, the current Gossip Network Engine does not grantee the following cases:

  1. The Recipient is actually ready to process the message (note that processing the message is different than receiving it.)
  2. The Recipient received the message correctly.
  3. If one party went offline for a bit, send any missing messages or re-deliver any missing messages.

Research

To improve the above said issues there could be different solutions to this, one of them is the following:

  1. For each known peer, we will maintain a buffer (message queue) where we will enqueue the messages and dequeue then once that peer is able to process them.
  2. For each known peer we will store which round they are in right now so that not to send a message to them from a different round and they will not be able to process it at the moment.
  3. Each round, the Peer will announce which round they are in right now so that other peers store that.
  4. if the peer disconnected, do not clear its storage immediately, the storage will have a TTL (time to live) and will be cleared automatically if the peer did not reconnect in a T seconds.
  5. A Peer can ask any peer to resend them any messages they have for them in their message queue.
  6. A Peer sends ACK message to the other peers with the message hash that they processed that message correctly.

Questions/Issues

  1. How much slower that would effect our network? considering the round trips and that coordination between the peers?

shekohex avatar Sep 01 '22 11:09 shekohex

Comments on line items

  1. This is not needed nor is it smart to do imo. Peers should not need to know what round other peers are on. Also, messages can be received out-of-order as per recent discussion in the Telegram chat with Denys.
  2. I don't think this should be necessary, it opens up another attack vector for peers to send each other invalid rounds maliciously.
  3. I would be curious to know if the underlying libp2p transport uses already guarantees this. We should ask this in the substrate stackexchange.

Thoughts on message delivery

It is worth considering that by design, if a peer is not around to receive messages, then it is considered offline and should be reported if it is supposed to be participating in the DKG protocol. Therefore, I think there is a fine line to walk between adding too much additional logic to this process. We certainly need to ensure at the very least that if peers are connected and online that they cache and process messages sent over the network once they prepare their round handles.

drewstone avatar Sep 01 '22 11:09 drewstone

Another things to take into account. Say there are 3 authorities (A, B, C).

If authorities A, B, manage to make it round i+1 after round i, I think it is safe to say they received the messages needed from peers in round i. Therefore, we can discard those messages from our failsafe cache. If we see that we have not made it to round i+1, then it might make sense to re-gossip the messages from round i periodically.

I think keeping this in mind, we would want the gossip network to store the messages from the last successfully completed round. Then, we can periodically check when to re-gossip them and discard them if we successfully proceeded.

drewstone avatar Sep 01 '22 12:09 drewstone

Closed by https://github.com/webb-tools/dkg-substrate/pull/510

1xstj avatar Mar 10 '23 14:03 1xstj