stacks-core icon indicating copy to clipboard operation
stacks-core copied to clipboard

feat: stacks signer able to save multiple dkg shares and load it where appropriate

Open netrome opened this issue 1 year ago • 3 comments

Description

This PR extends the signer persistence to allow for storing shares for multiple concurrent DKG rounds, identified by the aggregate public key for that round. On startup, the signer looks up the approved aggregate key in the contract and loads the corresponding shares.

To ensure the Nakamoto sign coordinator is able to pick up loaded DKG shares, the signers broadcast DkgResults messages when loaded with their polynomial commitments. The sign coordinator, which previously assumed that any DkgResults message contained the complete set of polynomial commitments, has been extended to also allow individual polynomial commitments in these messages - which it will then aggregate for all signers to form the complete set of commitments.

Applicable issues

  • closes #4654

Additional info

To ensure a bounded size stored in StackerDB, we limit the implementation to only hold two sets of shares concurrently to handle the scenario described in #4654.

netrome avatar Apr 22 '24 11:04 netrome

Everything LGTM! There are a few instances where it would be helpful to pluralize function names that now return states instead of one state.

Good point. I've updated get_signer as you noted, and also load_encrypted_signer_state which should be pluralized.

netrome avatar Apr 29 '24 21:04 netrome

The structure of this code is giving me a lot of pause. It seems that there are several places where the signer state can just be willy-nilly reset from saved state, but without any consideration to how it will affect the execution of the signer at that point in the reload. I fear this will lead to hard-to-reproduce bugs and state corruption, because it is hard to see in advance what the downstream consequences can be when changing this state mid-way through a main loop pass.

Instead, can this code be factored to make a state reload happen only at the top-level of the main loop? A reload should cancel any in-progress activities and force the signer to restart whatever it was doing anew.

I'm not too happy about the current structure either. While I have attempted to be careful with how and when we load state from the signer, I do share your concern. We've already seen plenty of hard-to-reproduce bugs in the signer so far, and I definitely don't want to add to it, especially since the intention of this change is to make the signer more robust towards bugs caused by network delays and timing issues.

I don't fully understand right now how exactly we'd be able to make the state reloads only happen at the top-level of the main loop (assuming you mean within theSignerRunLoop::run_one_pass function and not the top-level provided SignerRunLoop::main_loop function), but I will look into it with fresh eyes on Monday and see what I can do.

netrome avatar May 03 '24 20:05 netrome

I've updated now so that state reloads only happen in signer::update_approved_aggregate_key which is only called in signer::refresh_dkg, which in turn is only called from the main loop if no approved aggregate key is set. This eliminates some of the unnecessary boilerplate and allows us to instantiate the signer without doing yet another state reload after the signer has been constructed.

This feels much cleaner than the previous solution. Thank you for bringing this up @jcnelson. Let me know if you have further opinions on how to structure this. For example, if you want state reloads to be explicit in the main loop I can do some further refactors to achieve this - but I think the solution we have now fits better with the current structure of the signer.

netrome avatar May 06 '24 08:05 netrome

Closing this PR due to us going in a different direction with the signer.

netrome avatar May 14 '24 13:05 netrome