go
go copied to clipboard
Add support for ingesting via precomputed TxMeta in Horizon
Horizon currently has two modes of ingestion:
- The deprecated DatabaseBackend which extracts ledgers from Stellar Core's postgres DB
- Captive Core
We need to add a third mode which is ingesting from the precomputed TxMeta backend. This will require adding new LedgerBackend implementation and configuration flags for Horizon that allow operators to select the new ledger backend. We will also need to implement the following changes to the Horizon reingest command to support the new mode of ingestion:
- [ ] When using precomputed TxMeta we may need to adjust the minimum batch size and we may not need to round the batch sizes to multiples of 64.
- [ ] The default value for the
parallel-job-size
parameter needs to be reduced. (experimentally I found 100 to be the most efficient on production hw).
1 (DatabaseBackend) will be removed on completion of https://github.com/stellar/go/issues/4855
We have added a ledger backend implementation which will read precomputed Tx Meta from a data lake:
https://github.com/stellar/go/blob/master/ingest/ledgerbackend/buffered_storage_backend.go
The work remaining to complete this issue is:
- add command line / env flags to allow horizon to toggle between ingestion via captive core and ingestion via precomputed Tx Meta
- add command line / env flags for horizon to configure the BufferedStorageBackend
- update reingestion integration tests so we exercise reingestion via both captive core and the BufferedStorageBackend
We should make sure that we are streamlining the flags passed that specify captive-core vs data lake source system ingestion so that operators cannot configure invalid variants. The flags should be simple and easy for the operator to understand.
We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.
out of scope: productionalizing the code so that Horizon can fetch files from the GCS bucket. This will require coordination with ops
We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.
How about adding an option only to horizon db reingest|fill-gaps
that will allow switching between captive-core and precomputed txmeta thereby limiting the use of precomputed txmeta to reingestion only?
@urvisavla , I think your suggestion could be woven in as a toggle for live and reingest, I've updated the acceptance sub-tasks in the description of ticket to capture. please re-write if not on target. I can work one of tasks in parallel, let me know, thanks.