CCF icon indicating copy to clipboard operation
CCF copied to clipboard

Support and test for elections during private recovery

Open jumaffre opened this issue 3 years ago • 3 comments

Unearthed by https://github.com/microsoft/CCF/issues/3752

When the private ledger is being recovered, the service should also be resilient to elections. While we expect this to work in some cases, I believe this may not work in all cases, e.g. if all the nodes are in state Candidate when the private recovery completes (since the end of private recovery procedure automatically issues a new transactions that has to be committed on the primary node).

Most of the work here is to write thorough end-to-end tests to test this behaviour.

jumaffre avatar Apr 08 '22 13:04 jumaffre

This is in general really hard to test. It may be possible to test some of it by combining recovery with network partitions, e.g. isolate the primary we can wait for a given amount of time and count the number of elections being triggered. It might also be helpful to just keep issuing private transactions for some set amount of time, so that we have an idea of how long the recovery is expected to take.

wintersteiger avatar Apr 08 '22 14:04 wintersteiger

It seems like isolating the nodes from each other right after posting the recovery shares should do the trick, unless I'm missing something?

achamayou avatar Apr 11 '22 16:04 achamayou

Another simpler thing to do is to kill the primary shortly after the beginning of a primary recovery, and check that it finished correctly on two nodes.

achamayou avatar May 24 '22 09:05 achamayou