go icon indicating copy to clipboard operation
go copied to clipboard

services/horizon: Ingestion-Lite Prototype

Open paulbellamy opened this issue 2 years ago • 2 comments

PR Checklist

PR Structure

  • [ ] This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • [ ] This PR avoids mixing refactoring changes with feature changes (split into two PRs otherwise).
  • [ ] This PR's title starts with name of package that is most changed in the PR, ex. services/friendbot, or all or doc if the changes are broad or impact many packages.

Thoroughness

  • [ ] This PR adds tests for the most critical parts of the new functionality or fixes.
  • [ ] I've updated any docs (developer docs, .md files, etc... affected by this change). Take a look in the docs folder for a given service, like this one.

Release planning

  • [ ] I've updated the relevant CHANGELOG (here for Horizon) if needed with deprecations, added features, breaking changes, and DB schema changes.
  • [ ] I've decided if this PR requires a new major/minor version according to semver, or if it's mainly a patch change. The PR is targeted at the next release branch if it's not a patch change.

What

The working branch for the prototype Horizon, serving requests via "just-in-time" ingestion of ~history archives~ txmeta files.

3 services:

  • /exp/lighthorizon. Serves web requests
  • /exp/lighthorizon/index/batch. map/reduce style index rebuilder
  • /exp/lighthorizon/index/single. non-distributed index rebuilder
  • /exp/services/ledgerexporter. Runs captive-core, and outputs txmeta files.

Why

So operators don't need to maintain databases.

Fixes #4317

Known limitations

See: #4317 for todo list

paulbellamy avatar Apr 25 '22 16:04 paulbellamy

I am done with the first pass of the review. I left quite a few comments but didn't dive very very deep since it doesn't seem to be finished.

A few remarks:

  1. I think the TODOs should be addressed before the next pass (there are quite a few of them).
  2. I think it would be a good idea to replace the datastructure code by off-the-shelf (optimized and proven) code where possible. Namely bitmaps, tries and LRU cache. The bitmap and trie parts are the more important onces since they are going to end up stored in indices.
  3. I am not sure that the new archive code code really belongs in the historyarchive package since that has been used for checkpoint storage and we are now storing txmeta. It has made reading the code a bit confusing until I adjusted.
  4. The index formats need some documentation (even if minimal) and probably some more eyes since they may end up stored publicly if we productize this. How about creating a small doc and sharing it with the org (particularly the Core team?)

2opremio avatar May 04 '22 18:05 2opremio

Also, as a side note, I think that the indexing and archiving of Horizon Light could be reused as is in order to implement account backfilling for ingestion filtering (https://github.com/stellar/go/issues/4267 )

For asset backfilling we would need to implement an additional indexer, but that would be it.

(CC @sreuland @jcx120 )

2opremio avatar May 04 '22 18:05 2opremio