reth icon indicating copy to clipboard operation
reth copied to clipboard

feat(pruning): fair pruning

Open emhane opened this issue 9 months ago • 7 comments

Closes https://github.com/paradigmxyz/reth/issues/7343, related to pruner interruption ref https://github.com/paradigmxyz/reth/issues/6770.

  • Models exhaustive list of prunable tables as a ring.
  • Implements a segment iterator, that generates a cycle of segments, wrt to given start table.
  • Saves last pruned table between prune jobs. This ensures fair pruning, as the next job can pick up where the last one left off.

emhane avatar May 09 '24 15:05 emhane

still have to build some tests

emhane avatar May 09 '24 16:05 emhane

It feels a bit overcomplicated, can we just have a VecDeque of segments that we pop/push from/to?

How I see it:

  1. VecDeque of segments initialized when the pruner is initialized
  2. When we prune, we pop segments from the VecDeque one by one until there's none left
  3. When a limit (with timeout or items deleted) is hit, push segments that we ran to the end of the VecDeque
  4. On the next run, pop will return segments that we didn't run first

WDYT?

shekhirin avatar May 10 '24 11:05 shekhirin

Saves last pruned table between prune jobs

I am not sure if we need it, what is the case when we want to continue pruning some table inside a segment, and not the whole segment from the beginning?

shekhirin avatar May 10 '24 12:05 shekhirin

It feels a bit overcomplicated, can we just have a VecDeque of segments that we pop/push from/to?

How I see it:

  1. VecDeque of segments initialized when the pruner is initialized
  2. When we prune, we pop segments from the VecDeque one by one until there's none left
  3. When a limit (with timeout or items deleted) is hit, push segments that we ran to the end of the VecDeque
  4. On the next run, pop will return segments that we didn't run first

WDYT?

No need to reallocate memory, easiest is to just save the index we would have pruned next in the Vec<Box<dyn Segment>>.

True that there is no need to generate the segments, other than for static files, on each call to prune_segments. On second look, I saw that PruneMode::Before is not used in the static allocation of Vec<Box<dyn Segment>> which is built using PruneModes.

emhane avatar May 10 '24 17:05 emhane

Saves last pruned table between prune jobs

I am not sure if we need it, what is the case when we want to continue pruning some table inside a segment, and not the whole segment from the beginning?

checkpoints are saved when pruning stops

emhane avatar May 13 '24 18:05 emhane

No need to reallocate memory, easiest is to just save the index we would have pruned next in the Vec<Box<dyn Segment>>.

up to you, I'd prefer a VecDeque for a more intuitive API. Since all segments are Box, it's not a big overhead. Also, VecDeque doesn't have a requirement for the elements to be contiguous in memory.

"Since VecDeque is a ring buffer, its elements are not necessarily contiguous in memory." – from https://doc.rust-lang.org/std/collections/struct.VecDeque.html

shekhirin avatar May 15 '24 08:05 shekhirin

don't think it makes that big difference now that this is implemented + tested

emhane avatar May 16 '24 21:05 emhane

blocked by db background task design, cc @Rjected

emhane avatar Jul 03 '24 12:07 emhane