sui
sui copied to clipboard
Refactor Header proposal/voting in narwhal
- Core Header proposal moved to a linear async function outside the Core loop (see
Core::propose_header) - Responding to vote requests moved to primary RPC handler outside the Core loop (see
PrimaryReceiverHandler::request_vote)
Notes:
- Header-handling tests in
core_tests.rslargely replaced by equivalent tests inprimary_tests.rs - Deletes changes from PR #5793, which are now driven by retries of vote requests from remote primaries
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Updated |
|---|---|---|---|
| explorer | ✅ Ready (Inspect) | Visit Preview | Nov 10, 2022 at 1:15AM (UTC) |
FYI: Left a few questions for PR reviewers in comments tagged with "DO NOT MERGE".
Please don't remove the recovery path request of new headers. If we have periodic retries for all the invariants of liveness, then in theory it is safe to remove this. But with only periodic retries from one component, it may not be enough to guarantee safe removal. Also, it doesn't hurt us to have that recovery flow, and leaves us restarting as an emergency tool for now. I am leaving this as a first comment, more to come.
It must be removed, because we no longer even have the Header network message after this change or a process_header function at all in the Core. Vote requests (which are the new way of sending proposed headers) will be retried by remote primaries that still need votes from us, so we haven't really lost anything by removing this. You can still restart and expect that anything we haven't already voted on (and therefore persisted all the dependencies of locally) will be resent if it's something our local node can still help with.
If you still want some kind of proactive startup recovery thing, it would need to request Certificates and not headers. But I do not think we should do that. It's unclear to me what the benefit would be in exchange for that extra complexity.
If you still want some kind of proactive startup recovery thing, it would need to request Certificates and not headers. But I do not think we should do that. It's unclear to me what the benefit would be in exchange for that extra complexity.
You mentioned in a different location about the fact that we stop trying the header creation loop if we have > f irrecoverable errors. If this is indeed the case, then a proactive recovery mechanism is needed.
Thanks for the changes @aschran . I concur with others that to ensure liveness we need to either (1) keep sending messages that are needed for round advancement liveness (Header, Vote, Cert) until we advance and act on them or (2) have part of the validator (proposer loop sending Header, gathering votes, and sending certs) try and re-try, and re-try on crash recover to ensure progress. The details are down to (your) personal taste of course, but somehow this needs to happen I think.