[refactor] restore consensus committed sub dags from last index + 1
Description
Following this work https://github.com/MystenLabs/sui/pull/13819 we can now safely restore the committed sub dags from the last index + 1 since the consumer is atomically processing all the transactions and we don't need to re-send the last index subdag.
Test Plan
If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.
Type of Change (Check all that apply)
- [ ] protocol change
- [ ] user-visible impact
- [ ] breaking change for a client SDKs
- [ ] breaking change for FNs (FN binary must upgrade)
- [ ] breaking change for validators or node operators (must upgrade binaries)
- [ ] breaking change for on-chain data layout
- [ ] necessitate either a data wipe or data migration
Release notes
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| sui-typescript-docs | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Oct 11, 2023 0:17am |
4 Ignored Deployments
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| explorer | ⬜️ Ignored (Inspect) | Visit Preview | Oct 11, 2023 0:17am | |
| multisig-toolkit | ⬜️ Ignored (Inspect) | Visit Preview | Oct 11, 2023 0:17am | |
| mysten-ui | ⬜️ Ignored (Inspect) | Visit Preview | Oct 11, 2023 0:17am | |
| sui-kiosk | ⬜️ Ignored (Inspect) | Visit Preview | Oct 11, 2023 0:17am |
I'm unfamiliar with consensus handler and I'm actually not sure about this change.
last_executed_sub_dag_index()reads from thelast_consensus_indextable, which has anExecutionIndicesfield that gets updated for every transaction processed in the consensus handler.What if a consensus output has 10 transactions, and the node crashed after processing 7 transactions? Will the remaining 3 transactions be skipped after this change? We should run some restart tests with high consensus handler load, e.g. in private testnet. I'm open to be convinced if this is not an issue.
Ok, so updates to last_consensus_index are in a batch per consensus commit in process_consensus_transactions_and_commit_boundary(). This change should be fine as long as the Sui per-commit logic is indeed atomic.
I think this should work now that ConsensusHandler is atomic - it either will process all transactions in commit or none.
But lets add a big warning to the ConsensusHandler saying that code there should stay atomic, since we now rely on it in narwhal.
Let's also confirm we don't have a problem when we pass
last_executed_sub_dag_index=0to narwhal in the beginning of the epoch (e.g. we still replay from very beginning in that case)
@andll I can confirm that the recovery will work as expected with last_executed_sub_dag_index = 0 because we always publish the first committed sub dag with index = 1.
Regarding the ConsensusHandler being atomic there is already a comment , but as a best effort attempt I've also added a simtest test_consensus_handler_atomic to ensure this - so at least if someone accidentally breaks the atomicity assumption we do have something that might catch it (under assumptions) apart from the other sim tests. New test here https://github.com/MystenLabs/sui/pull/13901/commits/2ebaa10a33d8d66794fb198a9f41dfaebd221685
CC @mystenmark @mwtian
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.