go icon indicating copy to clipboard operation
go copied to clipboard

Add support for ingesting via precomputed TxMeta in Horizon

Open tamirms opened this issue 1 year ago • 5 comments

Horizon currently has two modes of ingestion:

  1. The deprecated DatabaseBackend which extracts ledgers from Stellar Core's postgres DB
  2. Captive Core

We need to add a third mode which is ingesting from the precomputed TxMeta backend. This will require adding new LedgerBackend implementation and configuration flags for Horizon that allow operators to select the new ledger backend. We will also need to implement the following changes to the Horizon reingest command to support the new mode of ingestion:

  • [ ] When using precomputed TxMeta we may need to adjust the minimum batch size and we may not need to round the batch sizes to multiples of 64.
  • [ ] The default value for the parallel-job-size parameter needs to be reduced. (experimentally I found 100 to be the most efficient on production hw).

tamirms avatar Jun 15 '23 09:06 tamirms

1 (DatabaseBackend) will be removed on completion of https://github.com/stellar/go/issues/4855

mollykarcher avatar Nov 02 '23 15:11 mollykarcher

We have added a ledger backend implementation which will read precomputed Tx Meta from a data lake:

https://github.com/stellar/go/blob/master/ingest/ledgerbackend/buffered_storage_backend.go

The work remaining to complete this issue is:

  • add command line / env flags to allow horizon to toggle between ingestion via captive core and ingestion via precomputed Tx Meta
  • add command line / env flags for horizon to configure the BufferedStorageBackend
  • update reingestion integration tests so we exercise reingestion via both captive core and the BufferedStorageBackend

tamirms avatar May 31 '24 19:05 tamirms

We should make sure that we are streamlining the flags passed that specify captive-core vs data lake source system ingestion so that operators cannot configure invalid variants. The flags should be simple and easy for the operator to understand.

We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.

sydneynotthecity avatar Jun 04 '24 16:06 sydneynotthecity

out of scope: productionalizing the code so that Horizon can fetch files from the GCS bucket. This will require coordination with ops

sydneynotthecity avatar Jun 04 '24 16:06 sydneynotthecity

We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.

How about adding an option only to horizon db reingest|fill-gaps that will allow switching between captive-core and precomputed txmeta thereby limiting the use of precomputed txmeta to reingestion only?

urvisavla avatar Jul 02 '24 17:07 urvisavla

@urvisavla , I think your suggestion could be woven in as a toggle for live and reingest, I've updated the acceptance sub-tasks in the description of ticket to capture. please re-write if not on target. I can work one of tasks in parallel, let me know, thanks.

sreuland avatar Jul 03 '24 16:07 sreuland