fuel-indexer feature: internal block store for backfilling newly-deployed indexers

From a message I sent in a discussion on Slack about #932:

Here’s what I would consider the “gold standard”: 0. Have an internal table/DS that holds BlockData somehow, so we don’t have to hit the client again.

Re-enable/fix forc index revert.

Upon re-deploying an indexer with saved data, inform the user that the tables from the previous version have been renamed for backup purposes, and if needed, the old indexer and its tables can be restored through the use of forc index revert.

Begin backfilling from the internal block table.

Once the last persisted block is added, set the indexer’s executor to request for blocks starting with last_persisted_block + 1.

In the case that there is another re-deployment, we then do the following:

Rename the already-renamed tables to a temp table name

Rename the currently used tables to the backup name

Delete the tables that are now two versions old.

Go to step 3.

In short, we should consider having a local store for blocks. I feel the main benefits of this feature are three-fold:

Upon deploying an indexer (or re-deploying an existing one), we could instantly begin to process blocks from the local store, which should decrease the amount of time that is needed for an indexer to fully index the chain (or at least index from their desired start_block) as we would avoid the latency from network requests.
Decrease the amount of requests to a Fuel node when an indexer is (re-)deployed. Currently, an indexer will begin to request blocks starting at the value of start_block in its manifest. With several indexers running at once, the total traffic for these requests starts to add up, and as we've seen in #979, a Fuel node deployment may have strategies in place to rate-limit traffic.
It would also allow for clean migration from one version of an indexer to another, which is a feature that was brought up some time ago in #382. A user would just upload a new version of an indexer and the data is re-indexed according to their schema and handler code; it would also allow for the workflow described in the message above.

Concerns for this approach:

How exactly do we store the data? This would essentially be a duplication of what the client does, which uses RocksDb as far as I know, and that's not exactly known as the lightest dependency.
We currently make client requests per indexer executor; we would need to ensure that adding to this local block store would be free from contention and data races.
The blockchain will grow infinitely large and so too will the space needed to save blocks.

Jun 07 '23 14:06 deekerno

Will add more info a bit later, but I think this could be really important, especially with regards to migrating data
- E.g., I deploy the same indexer with a different schema, and I want that not only use the new index going forward, but also (in parallel?) want to backfill, potentially to the genesis block 🩴

Jun 07 '23 22:06 ra0x3

@lostman Will this issue be handled by #1150 ?

Sep 20 '23 16:09 ra0x3

@ra0x3, no, that's only for missing blocks. Initially both were handled by a single PR: https://github.com/FuelLabs/fuel-indexer/pull/1297 but I split the missing blocks out. Missing blocks will be merged first and I'm bringing https://github.com/FuelLabs/fuel-indexer/pull/1297 up to date to reflect this.

Sep 21 '23 08:09 lostman

fuel-indexer fuel-indexer copied to clipboard

feature: internal block store for backfilling newly-deployed indexers

fuel-indexer
fuel-indexer copied to clipboard