reth
reth copied to clipboard
Tracking: Staged sync
Stage abstraction
This abstraction should be mostly done, pending changes related to how the database abstractions evolve - e.g. instead of taking a raw MDBX transaction, we will likely receive another type in the future.
Pipeline
- [x] Better unwind priorities (@onbjerg): The current unwind priority system is based on Akula's method, but it can and should be simplified to prevent footgunning
- [x] Error and skip events (@onbjerg): The pipeline emits events that are currently only used for testing, but may be useful later on for metrics or other things. In some cases
RanandUnwoundevents are emitted with "special" values that denote that a stage either failed or was skipped. We should just add events for these cases - [ ] Commit intervals (@onbjerg): Currently data is committed to the database every time a stage returns from
Stage::execute, but realistically this behavior should be tuneable to only commit meaningful progress
Tooling
- [ ] Benchmarking helpers: We want to benchmark stages, so we will probably end up needing some utilities to make that easier
- [ ] Profiling: We want insight into what the stages are doing to find paths to optimize. Currently we use
tracingto mark out spans and emit events - we might be able to leverage this info in conjunction with e.g.tracing_tracyto be able to use Tracy. However, there may be tools that are better suited for profiling in our case. - [ ] Metrics: While not only a thing for staged sync (we need them in general), tools to expose metrics should be provided as well.
Stages
Initially we will use the good learnings from Akula, which is based on good learnings from Silkworm and Erigon, and essentially delineate the stages around the same boundaries as they have. As we progress, we might need more stages than listed here (or fewer).
For the more complex stages I propose we create separate tracking issues that link back to this one.
- [x] HeaderDownload (@rkrasiuk): Downloads headers over P2P
- [ ] TotalGasIndex[^1]: Builds an index of
BlockNumber -> TotalGas. Seems to mostly be used for reporting. - [ ] BlockHashes[^1]: Builds an index of
BlockHash -> BlockNumberfrom theBlockNumber -> BlockHashtable built in theHeaderDownloadstage - [x] BodyDownload: Downloads block bodies and saves a minimal structure containing ommers, the first transaction ID in the block and the number of transactions. Also builds a table of
TxId -> Tx. - [x] TotalTxIndex[^1]: Builds an index of
BlockNumber -> TotalTx. Seems to only be used for reporting in the next stage. - [x] SenderRecovery: Recovers sender addresses in each transaction
- [ ] Execution: Executes blocks
- [ ] HashState: Hashes accounts and account storage
- [ ] Interhashes: Builds trie hashes
- [ ] AccountHistoryIndex[^1]: Builds indexes related to account histories/changesets
- [ ] StorageHistoryIndex[^1]: Builds indexes related to storage histories/changesets
- [ ] TxLookup[^1]: Builds an index of
TxHash -> BlockNumber, used in the RPC to look up transactions by hash. - [ ] CallTraceIndex[^1]: Builds indexes that specify where an account has been the origin or destination of a message
- [ ] Finish: Sets the chain tip (used in the RPC to figure out what our latest synced block is)
[^1]: These stages are generally what I would categorize as indexes, which we may be able to generalize somewhat.
Great breakdown, agree on all. @rkrasiuk can you pls open an issue for Headers stage? And let's open them one by one as we pursue them. @rakita opened https://github.com/foundry-rs/reth/issues/39 which is relevant to general eth testing of stages, Dragan do you want to open another specific one for how you're going to be approaching the Executor + Execution Stage?
@onbjerg Noticing there's a bunch of stages in erigon not present here, WDYT? e.g. https://github.com/ledgerwatch/erigon/tree/devel/eth/stagedsync#stage-15-transaction-pool-stage
It seems we're only missing 2? Transpilation stage, which we can't have because we don't have anything like TEVM, and the txpool stage, but we talked about having block building be a separate part since it's a bit more involved (might be custom, flashbots etc etc) so I don't think having that stage makes sense for us. Instead the block building part will just push down the block elsewhere through the pipeline
I added an additional trackng issue and reworded existing one:
- #39 I would need mockings of databases and p2p to pass through all stages. I am assuming that there will be some minor modifications to stages (As in header stage to simplify it) but I am not sure atm extent of them. But i like idea of using chain tests to cover all stages.
- #72 It is good to have execution and validation in one place. And there are additional functionalities that this module can give (As in building of blocks and execution of transactions). Utilities/functionalities would be aligned with the needs of stages.
I think in terms of #72 that would be in the consensus engine mostly, no? Or at least part of it @rakita
@onbjerg there are things that are common for all consensus types so that thing can be in reth-executor. I am not sure atm if consensus is going to call execution for additional verification or execution is going to call consensus we can see this later.
@rakita My point is - should these commonalities not be in a consensus crate (or a consensus-traits crate) instead of the execution crate? From what I've seen from e.g. Akula and Erigon, the stage calls consensus and the VM itself does not
I am not sure to be honest, consensus should contain only different consensus engines, for common things I mean block building, roots, execution etc. I would like to separate them into a standalone crate to have them in one place.
I see your point, for the stage side, to not complicate things maybe it is best just to use one trait Consensus and put any function that stages would need there, Consensus can just use whatever it needs internally.
Closing this as it is out of date - the stages have been shuffled around/merged/split etc. We need a few more stages, but those are handled in separate issues.