quickwit
quickwit copied to clipboard
add jitter to retention schedule
Description
add optional jitter to retention policy schedule, so that numerous indexes with the same schedule don't hammer down the metastore when they all execute their retention policy fix #5353
How was this PR tested?
unit test
On SSD:
Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3063, ref commit: 60368df89088297496ed351469dd3baa6f7cbd66
Link
On GCS:
Average search latency is 1.08x that of the reference (lower is better).
Ref run id: 3064, ref commit: 60368df89088297496ed351469dd3baa6f7cbd66
Link
I wonder if we could avoid exposing users to one more knob by jittering by default. We would have to pick a value between [next evaluation..next evaluation+1], maybe biased towards the beginning of the interval. WDYT? Would that be confusing for users?
i think it would be fine for policies which happen often enough (once every day or more often), but could get confusing if policies put in more delays, especially when entered in cron format. For instance if i put 0 0 * * 0 (every sunday at midnight), I wouldn't be too surprised if the task starts a few minutes or maybe an hour late, but i would be surprised if it started doing something on tuesday (or any other day). We could also run into issues where if we do something daily (0utc assumed), and for some reason it's company policy to restart servers at 1am, only 1/24 indexes would actually be cleaned each day (the other 23/24 are scheduled after 1am, then we restart, look at what the next time to run would be, it's midnight + some delay, sleep for the rest of the day)
We could do something where the jitter_secs is min(1h, time_between_2_schedules), which sounds generally less surprising, and maybe (or maybe not) allow manual configuration to something higher than 1h?
Yes, min(1h, time_between_2_schedules) is perfect. I meant something along those lines with "biased".