sui icon indicating copy to clipboard operation
sui copied to clipboard

sui: introduce state-sync process

Open bmwill opened this issue 2 years ago • 3 comments

Module comment which provides a high level overview of state-sync:

Peer-to-peer data synchronization of checkpoints.

This StateSync module is responsible for the synchronization and dissemination of checkpoints and the transactions, and their effects, contained within. This module is not responsible for the execution of the transactions included in a checkpoint, that process is left to another component in the system.

High-level Overview of StateSync

StateSync discovers new checkpoints via a few different sources:

  1. If this node is a Validator, checkpoints will be produced via consensus at which point consensus can notify state-sync of the new checkpoint via [Handle::send_checkpoint].
  2. A peer notifies us of the latest checkpoint which they have synchronized. State-Sync will also periodically query its peers to discover what their latest checkpoint is.

We keep track of two different watermarks:

  • highest_trusted_checkpoint - This is the highest checkpoint header that we've locally verified. This indicated that we have in our persistent store (and have verified) all checkpoint headers up to and including this value.
  • highest_synced_checkpoint - This is the highest checkpoint that we've fully synchronized, meaning we've downloaded and have in our persistent stores all of the transactions, and their effects (but not the objects), for all checkpoints up to and including this point. This is the watermark that is shared with other peers, either via notification or when they query for our latest checkpoint, and is intended to be used as a guarantee of data availability.

The [PeerHeights] struct is used to track the highest_synced_checkpoint watermark for all of our peers.

When a new checkpoint is discovered, and we've determined that it is higher than our highest_trusted_checkpoint, then StateSync will kick off a task to synchronize and verify all checkpoints between our highest_synced_checkpoint and the newly discovered checkpoint. This process is done by querying one of our peers for the checkpoints we're missing (using the [PeerHeights] struct as a way to intelligently select which peers have the data available for us to query) at which point we will locally verify the signatures on the checkpoint header with the appropriate committee (based on the epoch). As checkpoints are verified, the highest_synced_checkpoint watermark will be ratcheted up.

Once we've ratcheted up our highest_trusted_checkpoint, and if it is higher than highest_synced_checkpoint, StateSync will then kick off a task to synchronize the contents of all of the checkpoints from highest_synced_checkpoint..=highest_trusted_checkpoint. After the contents of each checkpoint is fully downloaded, StateSync will update our highest_synced_checkpoint watermark and send out a notification on a broadcast channel indicating that a new checkpoint has been fully downloaded. Notifications on this broadcast channel will always be made in order. StateSync will also send out a notification to its peers of the newly synchronized checkpoint so that it can help other peers synchronize.

bmwill avatar Oct 09 '22 23:10 bmwill

💳 Wallet Extension has been built, you can download the packaged extension here: https://github.com/MystenLabs/sui/actions/runs/3216378497#artifacts

github-actions[bot] avatar Oct 09 '22 23:10 github-actions[bot]

can you write up some comments describing how this is intended to integrate with the fullnode and/or validator? Which specific traits/methods define that interface? Perhaps you can put the public parts of the API (especially traits and types) into a separate file so that it is a bit more clear. I started writing a bunch of comments but had to delete them all because it eventually became clear that I didn't understand how this will integrate.

Yeah, this PR isn't completely ready for a thorough review (since its not ready to be landed) so I still need to make some things a bit cleaner.

a. One of the biggest questions here is how/if the datastores are shared. It seems that the StateSync may want to store objects before they are fully processed by the local node. But once they have been processed, will it be possible for StateSync to prune its store and request things from the main CheckpointStore? We don't need to support this optimization from day 1, but it would be worth laying out your vision here.

My expectation would be to be able to directly use the same datastores as the rest of the node. I think that the datastores probably need to be cleaned up a little bit before that's possible though and then ideally anything that isn't put directly there can just be in RAM (and pruned or eliminated as needed). Essentially once we've validated a checkpoint header we should be able to stick it in the main store. We'll probably just want a couple of watermarks, e.g. "highest validated checkpoint", "highest processed checkpoint", etc.

I see that you have a lot of logic here for tracking the checkpoint height of peers. I want to raise the question of whether this is the right place for the logic.

Yes, tracking of this information is key in order to know who to request data from. Without it, you're blind in knowing who you should go to request the data from.

a. First, it will end up duplicating some of the checkpointing logic.

Which logic are you referring to?

b. The bigger concern is that it seems to limit the efficiency of the network. If I'm reading your code right, a peer won't help propogate a checkpoint that it isn't synced to. This means there will be a period of time after a node joins the network where it is consuming resources but not giving anything back. This will be especially bad if a large number of nodes join the network at the same time, as it will reduce the fraction of nodes on the network that are able to service requests.

But, conceptually, there is no reason why a node can't help disseminate checkpoint 100 even if it is only synced to checkpoint 50. Similarly there is no reason why a CheckpointContents must be synced before the corresponding CheckpointSummary can be pushed to peers. If you remove the assumptions about checkpointing from this code, this should become much more natural.

I acknowledge there is a difficulty here since you are currently relying on checkpoints to find out the current committee, without which you can't verify current checkpoints. However, that can be solved by adding a method that allows you to get all committees going back to genesis / a waypoint from a peer. At that point you'll be fully bootstrapped and can verify and disseminate all objects from previous epochs even if you receive them out of order.

If a node has only synced to checkpoint 50, then it cannot help propagate checkpoint 100 or any of the checkpoints between 51-100 because it does not have the data for those checkpoints and will be unable to serve any requests for that data. What it can do is, help propagate everything below checkpoint 50 until it has itself synced further. The way this is presently designed is to signal data availability, this lets you know exactly how "up-to-date" your peers are and who you can ask data from. This makes things more efficient because you don't have to randomly ask your peers "Do you have data for checkpoint X" and hope they have it.

Lastly, in preparation for the possibility of having very short checkpoint intervals, I think we should have any "get" endpoint be a multi-get. It will be much more efficient at every level to say "give me all CheckpointSummaries in [N, M]" then to issue thousands of individual requests. Of course the number can be limited.

I think this is an optimization that we can make at a later point. For right now I think its a bit easier to reason about as is. Also there's not a lot of overhead of just sending N single requests to the same node effectively giving you the mult-get api.

bmwill avatar Oct 31 '22 16:10 bmwill

@bmwill is attempting to deploy a commit to the Mysten Labs Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Nov 09 '22 19:11 vercel[bot]

🚀 🚀 🚀

lxfind avatar Nov 15 '22 20:11 lxfind