chainlink icon indicating copy to clipboard operation
chainlink copied to clipboard

CCIP-1496 Moving pruning to a separate loop and respecting paging there

Open mateusz-sekara opened this issue 1 year ago • 3 comments

Motivation

Current implementation of pruning expired logs has two major drawbacks:

  • it removes all expired logs within a single database call. First of all, this could generate high pressure (especially IO) when removing large number of logs at once (> 100k).
  • for long prunes/deletes, regular PollAndSave is not executed in parallel, because there is one LogPoller's routine per chain. That might impact product's availability/

Solution

Solution here is to remove data in batches instead of everything at once. Paging not only limits locking but also gives us finer control over the operation, allowing the database to handle other transactions more efficiently between batches. To increase isolation between LogPoller's main routine we've also moved pruning to a separate routine. Besides that, we've added information to deletes about number of records deleted (it's also tracked with Prometheus metric). This should increase our visibility on how fast logs are created/deleted, based on this data we should be able to adjust page size/prune threshold if necessary.

To make it backward compatible, prune paging is enabled only if LogPrunePageSize is defined. That being said, once merged it should not impact any existing products until LogPrunePageSize is defined.

mateusz-sekara avatar Feb 16 '24 15:02 mateusz-sekara

I see that you haven't updated any README files. Would it make sense to do so?

github-actions[bot] avatar Feb 16 '24 15:02 github-actions[bot]

Since this adds a new toml config param, we should also mention it in docs/CHANGELOG.md

reductionista avatar Feb 20 '24 23:02 reductionista