lotus
lotus copied to clipboard
feat: migration("re-indexing"), backfilling and diasgnostics tooling for the `ChainIndexer`
ChainIndexer Migration and Diagnostics Tooling
This PR implements the "migration" (really re-indexing / backfilling), and diagnostics tooling for the ChainIndexer
implemented in PR #12450, and is part of the work for #12453. This tooling takes the form of both RPC APIs on the daemon and lotus-shed
CLI commands.
Re-indexing Process
The re-indexing tool enables clients to index their entire existing ChainState in the ChainIndexer
. This process is necessary due to the removal of the existing MsgIndex
, EthTxIndex
, and EventIndex
from Lotus.
Why Re-index Instead of Migrate?
We've chosen to re-index rather than migrate data from existing indices for two primary reasons:
- Known issues: The existing indices have multiple known problems, and migrating could perpetuate incorrect index entries.
- Lack of garbage collection: Existing indices contain many entries for which the corresponding tipset messages/events no longer exist in the ChainStore due to splitstore GC.
Instead, we're re-indexing the Chainstore
/Chainstate
on the node into the ChainIndexer
. This ensures that all re-indexed entries have gone through the indexing logic of the new ChainIndexer
and that the Index is in sync/reflects the actual contents of the Chainstore
/Chainstate
post re-indexing.
Diagnostics Tooling
This PR introduces diagnostic tools for detecting corrupt Index entries at specific epochs or epoch ranges.
While this PR implements functionality for optionally backfilling missing Index entries, it does not yet include the capability to "repair" corrupted Indexed entries. The repair functionality will be introduced in a subsequent PR. This approach allows us to first gather and analyze user reports, helping us understand the types and causes of corrupted Indexed entries(and if all they exist in the new ChainIndexer
) before implementing repair mechanisms.
Core API
The fundamental building block for this tooling is the following RPC API:
type IndexValidation struct {
TipsetKey string
Height uint64
TotalMessages uint64
TotalEvents uint64
EventsReverted bool
Backfilled bool
}
func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error)
This API has the following features:
- Optionally backfills the Index with a tipset on the canonical chain for the given epoch if it is absent in the Index
- Returns some aggregated stats for an indexed entry for diagnostics/inspection
- Reports errors/corrupted indexed entries at the given epoch. Forms of Index corruption that can be diagnosed includes:
- Presence of multiple non-reverted tipsets at the given epoch
- Complete absence of a non-reverted tipset at the given epoch that does contain reverted tipsets
- Mismatch between the
Chainstore
state and the Indexed entries (tipset messages/events) - Incorrect Indexing of null rounds at the given epoch
lotus-shed
CLI tooling
The lotus-shed
CLI tooling for both re-indexing/backfilling and diagnostics can then invoke this RPC API over epoch ranges. The corresponding lotus-shed backfill index [from, to]
and lotus-shed inspect index [from, to]
can then backfill/inspect/diagnose the Index for the given epoch ranges.
TODO
- [ ] automated tests