stacks-core icon indicating copy to clipboard operation
stacks-core copied to clipboard

Event observer recoverability in event of unclean stacks-node shutdown

Open obycode opened this issue 1 year ago • 2 comments

Problem

During the restart to upgrade naka-4 to rc3, we witnessed this situation:

  1. stacks-node processes a block
  2. stacks-node is shutdown before successfully sending the new block event to event observers (API in this case)
  3. stacks-node is restarted
  4. Because the last block was successfully processed, the node does not know that it never successfully sent the block to the event observers, so it proceeds with the next block
  5. API observer errors when it receives the next block, since it never received its parent block
  6. stacks-node is unable to proceed since it does not receive a successful response for the new block event

Proposed solution

  • Create a new database to store outstanding events
  • Before attempting to send an event to observers, record the event in this new database
  • For each event in the database:
    • Send the event to all observers
    • Delete the event from the database
  • Proceed after all events have been successfully sent

obycode avatar Oct 07 '24 18:10 obycode

The most obvious place to implement this change is directly in EventObserver::send_payload. This would result in duplicated information in the database if a node has multiple observers, but it would reduce the amount of refactoring required and also give us finer grain info about which observers need events rebroadcasted (only rebroadcast to observers that did not confirm the event last time, instead of always rebroadcasting the event to all observers). In the majority of cases, a node probably has 0 or 1 observers, so there is likely no real difference in practice.

obycode avatar Oct 07 '24 20:10 obycode

This is addressed in #5289.

obycode avatar Oct 09 '24 15:10 obycode

Merged.

obycode avatar Oct 21 '24 13:10 obycode

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

blockstack-devops avatar Oct 29 '24 00:10 blockstack-devops