quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

add jitter to retention schedule

Open trinity-1686a opened this issue 1 year ago • 4 comments

Description

add optional jitter to retention policy schedule, so that numerous indexes with the same schedule don't hammer down the metastore when they all execute their retention policy fix #5353

How was this PR tested?

unit test

trinity-1686a avatar Aug 28 '24 16:08 trinity-1686a

On SSD:

Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3063, ref commit: 60368df89088297496ed351469dd3baa6f7cbd66
Link

On GCS:

Average search latency is 1.08x that of the reference (lower is better).
Ref run id: 3064, ref commit: 60368df89088297496ed351469dd3baa6f7cbd66
Link

github-actions[bot] avatar Aug 28 '24 18:08 github-actions[bot]

I wonder if we could avoid exposing users to one more knob by jittering by default. We would have to pick a value between [next evaluation..next evaluation+1], maybe biased towards the beginning of the interval. WDYT? Would that be confusing for users?

guilload avatar Sep 11 '24 21:09 guilload

i think it would be fine for policies which happen often enough (once every day or more often), but could get confusing if policies put in more delays, especially when entered in cron format. For instance if i put 0 0 * * 0 (every sunday at midnight), I wouldn't be too surprised if the task starts a few minutes or maybe an hour late, but i would be surprised if it started doing something on tuesday (or any other day). We could also run into issues where if we do something daily (0utc assumed), and for some reason it's company policy to restart servers at 1am, only 1/24 indexes would actually be cleaned each day (the other 23/24 are scheduled after 1am, then we restart, look at what the next time to run would be, it's midnight + some delay, sleep for the rest of the day)

We could do something where the jitter_secs is min(1h, time_between_2_schedules), which sounds generally less surprising, and maybe (or maybe not) allow manual configuration to something higher than 1h?

trinity-1686a avatar Sep 12 '24 14:09 trinity-1686a

Yes, min(1h, time_between_2_schedules) is perfect. I meant something along those lines with "biased".

guilload avatar Sep 13 '24 18:09 guilload