lotus Enable efficient indexing of historical chain data

Enable efficient indexing of historical chain data

Open fridrik01 opened this issue 2 years ago • 4 comments

See https://github.com/filecoin-project/fvm-pm/issues/299

Lotus currently uses the following sqlite databases:

sqlite/events.db: Stores events sent by actors in the FVM
sqlite/txhash.db: Stores mappings of Eth tx hash to Filecoin message cid
sqlite/msgindex.db: Stores block messages cid and their tipset cid for faster lookup

We should try to unify all the different databases into one which should make maintaining correctness and doing recovery/backfilling simpler.

Also, we make sure we handle the following:

Make sure we handle all edge cases (forks, reverts, config changes that require pruning, etc)
Allow enabling/disabling what to index
~~Be able to configure lookback so these indices don't grow endlessly~~ Not needed as the indexes are tiny compared to chain data (less than 0.25%)
#11007

May 03 '23 12:05 fridrik01

populating on snapshot import if not already doing

May 03 '23 16:05 raulk

Newbie question: what was the reason for having 3 separate databases to begin with?

Aug 06 '24 19:08 BigLep

I believe the idea was that it made it easy to tell if one was getting large and remove/disable it. I'd kind of like to introduce (even in shed) some form of GC command before we unify them, but I do think unifying them is the way to go (that and re-organizing our tables to massively reduce the amount of duplicate data).

Aug 07 '24 20:08 Stebalien

In terms of: why multiple observers? We wanted to keep these subsystems separate. But I wouldn't be opposed to a new architecture here (that, e.g., lets us cleanly keep track of what we've indexed and what we haven't) as long as we can make it somewhat pluggable. HOWEVER, if we want to be able to enable/disable these indices independently... we'll need to track what has been indexed and what has not been indexed independently as well.

Aug 07 '24 20:08 Stebalien

lotus lotus copied to clipboard

Enable efficient indexing of historical chain data

lotus
lotus copied to clipboard