avalanchego icon indicating copy to clipboard operation
avalanchego copied to clipboard

Dynamic State Sync: Delay Readiness Fix

Open abi87 opened this issue 1 year ago • 1 comments

Why this should be merged

Fixing https://github.com/ava-labs/avalanchego/issues/1296 requires that a VM state depends on the whole subnet state. Specifically, we want to declare a Subnet synced iif:

  1. All of its chains have bootstrapped
  2. All of its chains that have started state sync, have completed state sync.

So VM state flow will change as follows:

  1. a VM will start state syncing or bootstrapping, State sync may be static, i.e. complete before moving into bootstrap, or dynamic, i.e. go on asynchronously while node transition into next states.
  2. a VM will move to bootstrap and then to extending frontier state (previously called normal operations), even if dynamic state sync is not completed. Note that during extending frontier, the VM won't be necessarily able to fully verify incoming transactions, especially cross-chain ones that need other chains in the vm to be synced. Blocks verification is effectivelly offloaded to the network and VM will leverage consensus and Accept blocks as if it had fully validated the blocks. In such state API calls won't be served and outgoing chits messages will reported the latest accepted container, not the current preferred (which may be only partially validated).
  3. Once all chains in a subnet have completed both bootstrapping and state sync, the whole subnet will be marked as synced and all VMs will move to SubnetSynced state. During SubnetSynced APIs request wil be finally served and a node will be able to fully verify incoming transactions (especially cross-chain ones) and to create blocks.

How this works

We currently track VMs state in Context and whole Subnet State in common.BootstrapTracker. Note that:

  1. VM state is set to StateSyncing/Bootstrapping/NormalOps when stateSyncer/bootstrapper/engine.Start is called
  2. A Chain is declared bootstrapped as soon as bootstrapped.finish completes (even if engine.Start) has not been called.

This PR changes these objects so that we track both subnet and its vms state in a single place. The PR removes Context.State attribute and moves subnet and vm state tracking to common.BootstrapTracker, which I renamed to snow.SubnetStateTracker. Also it implements in snow.SubnetStateTracker the new logic to declare a subnet fully synced.

Note also that we won't restart a VM bootstrap if other chains in the same subnet have not completed bootstrapping themselved. Instead VM will go right into ExtendingFrontier

How this was tested

CI + extra UTs to document new subnet tracker behaviour. Working on pulling up a Fuji validator.

abi87 avatar Apr 05 '23 16:04 abi87

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

github-actions[bot] avatar Mar 31 '24 00:03 github-actions[bot]