go icon indicating copy to clipboard operation
go copied to clipboard

horizon: `history_claimable_balances` is not cleared out by the reaper.

Open 2opremio opened this issue 2 years ago • 6 comments

We should:

  1. Make sure the reaper takes care of history_claimable_balances
  2. Make sure all the other history tables are included.
  3. Add a test that ensure all the history_ tables in the DB are cleared out (in order prevent this problem from happening to tables added in the future).

2opremio avatar May 23 '22 18:05 2opremio

We have three tables which aren't cleaned up when deleting a range. Those are:

  • history_claimable_balances
  • history_liquidity_pools
  • history_accounts
  • history_assets

They all pair an entity (claimable balance, liquidity pool, account) with an internal id:

# \d history_accounts
 id      | bigint                |           | not null | nextval('history_accounts_id_seq'::regclass)
 address | character varying(64) |           |          | 

# \d history_liquidity_pools
 id                | bigint |           | not null | nextval('history_liquidity_pools_id_seq'::regclass)
 liquidity_pool_id | text   |           | not null | 

# \d history_assets
 id           | integer               |           | not null | nextval('history_assets_id_seq'::regclass)
 asset_type   | character varying(64) |           | not null | 
 asset_code   | character varying(12) |           | not null | 
 asset_issuer | character varying(56) |           | not null | 

# \d history_claimable_balances
 id                   | bigint |           | not null | nextval('history_claimable_balances_id_seq'::regclass)
 claimable_balance_id | text   |           | not null | 

In theory, every time we clear out a range of data (either for reingesting or for reaping) we should also remove any orphan entries in the tables above (an orphan entry is one whose internal id isn't used anymore, e.g. for history_claimable_balances, that would be all entries which cannot be joined with history_operation_claimable_balances or history_transaction_claimable_balances).

However, the queries required to do so may be too expensive. As I see it we could either:

  1. clear out any orphan entries (by figuring out which ids are not matched in any other tables)
  2. before clearing out the other tables, figure out which ids are only used the in the ledger range to clear out (and would end up being orphans after the fact)

I think both (1) and (2) may be too expensive for large enough history_* tables

2opremio avatar Jul 12 '22 18:07 2opremio

maybe @sydneynotthecity may have a better querying suggestion?

2opremio avatar Jul 12 '22 18:07 2opremio

Do you know how expensive of a query cost is too expensive?

Another option is that we could create a trigger that tracked any record deletions in the history_* tables and logged them to an events logging table. We could set it up in a way where the deleted id was logged to the events table. Then, we could delete the orphaned entries based on a join to this events table, which should be significantly cheaper than joining on the original tables.

Once the records were cleared out, we could either update an indicator in the event tables for is_deleted = TRUE or just wipe the records from the events table so that the table only retains orphaned records to be deleted.

sydneynotthecity avatar Jul 14 '22 15:07 sydneynotthecity

@2opremio the other point is the history_transaction_claimable_balances and history_operation_claimable_balances were missing indices for claimable_balance_id which is what was causing joins/searches on claimable_balance_id to be super slow. @sreuland added the indices as a test in #4455 and it looks like it has greatly improved query performance.

It appears that the equivalent liquidity_pools tables are also missing indices on liquidity_pool_id only. My guess is that if we added those as well, we should see enough query performance improvement to be able to identify orphaned records by query, like you originally proposed.

sydneynotthecity avatar Jul 14 '22 15:07 sydneynotthecity

let's wait for the indices to be added and retake this after that.

2opremio avatar Jul 26 '22 13:07 2opremio

https://github.com/stellar/go/pull/4518 clears orphans from history_claimable_balances and history_liquidity_pools. The last two remaining tables (history_accounts and history_assets) require indexes in other tables like horizon_operation_participants. @ire-and-curses agreed that new indexes should be add in one of the future releases after further tests. EDIT: I was able to actually use existing indexes to reap history_accounts in https://github.com/stellar/go/pull/4525. The remaining one is history_assets which require an index (even multicolumn index would work) in tables where it's used.

bartekn avatar Aug 09 '22 09:08 bartekn