chainlink icon indicating copy to clipboard operation
chainlink copied to clipboard

CCIP-1496 Delete expired logs by the block_timestamp

Open mateusz-sekara opened this issue 1 year ago • 2 comments

Motivation

Delete expired logs should use block_timestamp instead of created_at. created_at is generated when inserting a block into the database. These values can be completely different for different nodes because of the

  • different clocks
  • inserting logs to DB at different times due to outage, replay etc In this case, block_timestamp is the real value on "when" the block was produced and should be the same for all nodes.

This is a very similar case to what was fixed some time ago for the read layer https://github.com/smartcontractkit/chainlink/pull/10743

Additionally, we can get rid of the created_at index, because after this change filtering by created_at is not used at all. One index less should save some time during insert/delete operations on evm.logs table. The only scenario that might be a problem here is running replay from block older than the defined retention. Prune will start removing logs that are just replayed, this won't be a problem when we keep it filtering by created_at

No big difference between deleting by block_timestamp and created_at. Removing around ~90k logs

-- created_at
 Delete on logs l  (cost=0.56..201494.41 rows=92352 width=12) (actual time=6069.856..6069.857 rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..201494.41 rows=92352 width=12) (actual time=0.067..2268.151 rows=1309224 loops=1)
         ->  Seq Scan on log_poller_filters  (cost=0.00..54.99 rows=277 width=60) (actual time=0.013..0.620 rows=271 loops=1)
               Filter: (evm_chain_id = '11155111'::numeric)
               Rows Removed by Filter: 1021
         ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs l  (cost=0.56..725.26 rows=196 width=60) (actual time=1.103..7.819 rows=4831 loops=271)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = log_poller_filters.address) AND (event_sig = log_poller_filters.event) AND (created_at <= (statement_timestamp() - '6415:33:20'::interval)))
 Planning Time: 0.422 ms
 Execution Time: 6069.896 ms
(9 rows)
-- block_timestamp

 Delete on logs l  (cost=0.56..193585.58 rows=92282 width=12) (actual time=5568.214..5568.215 rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..193585.58 rows=92282 width=12) (actual time=0.048..1760.326 rows=1309284 loops=1)
         ->  Seq Scan on log_poller_filters  (cost=0.00..54.99 rows=277 width=60) (actual time=0.013..0.480 rows=271 loops=1)
               Filter: (evm_chain_id = '11155111'::numeric)
               Rows Removed by Filter: 1021
         ->  Index Scan using evm_logs_by_timestamp on logs l  (cost=0.56..696.71 rows=196 width=60) (actual time=0.011..5.941 rows=4831 loops=271)
               Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = log_poller_filters.address) AND (event_sig = log_poller_filters.event) AND (block_timestamp <= (statement_timestamp() - '6415:33:20'::interval)))
 Planning Time: 0.416 ms
 Execution Time: 5568.253 ms
(9 rows)

mateusz-sekara avatar Feb 15 '24 10:02 mateusz-sekara

I see that you haven't updated any README files. Would it make sense to do so?

github-actions[bot] avatar Feb 15 '24 10:02 github-actions[bot]