sui
sui copied to clipboard
Checkpoint process can stall if all fragments sent to consensus are lost
Currently when we generate new fragment we send it to consensus and do not have any persistent retry mechanism. After we generate fragment for each authority, we simply assume that those fragments are persisted in consensus and never try to generate new fragments again. We have an in-memory retry in CheckpointConsensusAdapter
, but it is not persistent.
Sending to consensus itself is not persistent/reliable - there are multiple places where fragments in flight to consensus are buffered in memory(namely the consensus sending channel on the SUI and batch buffering on narwhal) and will be lost when node restarts.
To be more specific the problematic flow is this - we generate fragment for each authority, persist them in local_fragments
table and send them to consensus. If node soon fails before submitting consensus batch, local_fragments
will contains fragments for all validators and checkpoint process will be permanently stalled for the validator - no new fragment can be generated(since there are pending fragments persisted in DB) and pending fragments in local_fragments
will never make into consensus(since they were sent to channel but lost due to node restart).
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
@lxfind @mystenmark @arun-koshy - is this still relevant? If not, let's close it. Thanks. #spring-cleanup