sui icon indicating copy to clipboard operation
sui copied to clipboard

[refactor] restore consensus committed sub dags from last index + 1

Open akichidis opened this issue 2 years ago • 4 comments

Description

Following this work https://github.com/MystenLabs/sui/pull/13819 we can now safely restore the committed sub dags from the last index + 1 since the consumer is atomically processing all the transactions and we don't need to re-send the last index subdag.

Test Plan


If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.

Type of Change (Check all that apply)

  • [ ] protocol change
  • [ ] user-visible impact
  • [ ] breaking change for a client SDKs
  • [ ] breaking change for FNs (FN binary must upgrade)
  • [ ] breaking change for validators or node operators (must upgrade binaries)
  • [ ] breaking change for on-chain data layout
  • [ ] necessitate either a data wipe or data migration

Release notes

akichidis avatar Sep 21 '23 14:09 akichidis

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-typescript-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 11, 2023 0:17am
4 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Visit Preview Oct 11, 2023 0:17am
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Oct 11, 2023 0:17am
mysten-ui ⬜️ Ignored (Inspect) Visit Preview Oct 11, 2023 0:17am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Oct 11, 2023 0:17am

vercel[bot] avatar Sep 21 '23 14:09 vercel[bot]

I'm unfamiliar with consensus handler and I'm actually not sure about this change. last_executed_sub_dag_index() reads from the last_consensus_index table, which has an ExecutionIndices field that gets updated for every transaction processed in the consensus handler.

What if a consensus output has 10 transactions, and the node crashed after processing 7 transactions? Will the remaining 3 transactions be skipped after this change? We should run some restart tests with high consensus handler load, e.g. in private testnet. I'm open to be convinced if this is not an issue.

Ok, so updates to last_consensus_index are in a batch per consensus commit in process_consensus_transactions_and_commit_boundary(). This change should be fine as long as the Sui per-commit logic is indeed atomic.

mwtian avatar Oct 03 '23 18:10 mwtian

I think this should work now that ConsensusHandler is atomic - it either will process all transactions in commit or none.

But lets add a big warning to the ConsensusHandler saying that code there should stay atomic, since we now rely on it in narwhal.

Let's also confirm we don't have a problem when we pass last_executed_sub_dag_index=0 to narwhal in the beginning of the epoch (e.g. we still replay from very beginning in that case)

@andll I can confirm that the recovery will work as expected with last_executed_sub_dag_index = 0 because we always publish the first committed sub dag with index = 1.

Regarding the ConsensusHandler being atomic there is already a comment , but as a best effort attempt I've also added a simtest test_consensus_handler_atomic to ensure this - so at least if someone accidentally breaks the atomicity assumption we do have something that might catch it (under assumptions) apart from the other sim tests. New test here https://github.com/MystenLabs/sui/pull/13901/commits/2ebaa10a33d8d66794fb198a9f41dfaebd221685 CC @mystenmark @mwtian

akichidis avatar Oct 11 '23 13:10 akichidis

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 11 '23 01:12 github-actions[bot]