HotShot icon indicating copy to clipboard operation
HotShot copied to clipboard

General task architecture refactor

Open ss-es opened this issue 9 months ago • 0 comments

Closes #3208

The general goal of this PR is to make it possible to halt tasks, retrieve their state and restart them.

I think this would be nice for a number of reasons, but in particular it would allow us to very easily upgrade NodeType at runtime.

This PR:

  • Simplifies the TaskState trait: implementations must now only specify an associated Event type, the handle_event function and a cancel_subtasks function to cleanup any spawned subtasks.
  • Tasks that receive their events from the network have had their TaskState implementations removed, and will be subsumed under a new NetworkTaskState in a future PR.
  • Each task returns its state as a boxed TaskState object via a JoinHandle. These JoinHandles are collected in a ConsensusTaskRegistry.
  • Network tasks return a JoinHandle<()> are collected in a NetworkTaskRegistry, and will be dealt with in a future PR.
  • Test tasks no longer implement TaskState, because they fundamentally do not listen on a single receiver in the same way that consensus tasks do.
  • TestTaskState requires a check() function that returns the pass/fail result of the test.

Minor:

  • The RwLock and Arc indirection in task registries is removed, and these now simply contain a Vec of JoinHandles. As a consequence, SystemContextHandle is not clonable; this is a mildly dangerous operation anyway, since a HotShot handle is a unique object and cloning it does not create a new one, and makes ownership harder to reason about, especially around mutable operations like shut_down. An application that wishes to hold multiple handles should put the handle it receives into a Arc<RwLock<...>> and synchronize calls to the handle itself.
  • shut_down is no longer a BoxSynFuture, because shutting down the consensus registry is not Sync (it must await on the JoinHandles it holds)
  • Various tasks no longer receive a complete cloned SystemContextHandle; they only needed 1-2 fields, so we simply pass these directly to avoid additional the additional ConsensusApi constraint.

This PR does not:

There should not be any external, observable changes in runtime behavior.

Key places to review:

This PR is quite large because of the scope of the changes, but the vast majority of it is rewiring. I think the most significant intentional changes are in:

  • task/src/task.rs: the main trait changes. I believe it is easier to simply read the final file, rather than the diff.
  • testing/src/test_task.rs: the test task changes. Note that TestTask and the TestTaskState trait were previously located in task/src/task.rs, but these were essentially rewritten from scratch. But it's definitely worth looking at other files if you can. If there's anything you suspect might be an unintentional or incorrect logic change, please flag it.

ss-es avatar May 08 '24 18:05 ss-es