quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Make sure the janitor does not spam the metastore too much

Open fulmicoton opened this issue 1 year ago • 3 comments

In https://github.com/quickwit-oss/quickwit/pull/5346 we have spotted that our implementation of delete index was too aggressive.

For airmail, their internal job deleting a large number of indexes ended up hammering the metastore, hence disrupting indexing.

We want to make sure that we don't have a similar pattern in the janitor. In particular, when running the retention policy.

fulmicoton avatar Aug 27 '24 14:08 fulmicoton

(@trinity-1686a maybe there is not problem... If so, please just comment here and close the ticket)

fulmicoton avatar Aug 27 '24 14:08 fulmicoton

there is definitely a problem here. Last i checked, the retention policy is executed on a strict cron-like schedule. If many indexes share the same schedule frequency, they would all run at once (technically, one after the other in quick succession, as fast as possible). Right now based on airmail logs, it seems we run roughly 20k retention policies all at once.

trinity-1686a avatar Aug 27 '24 16:08 trinity-1686a

we also seem to execute all GC calls at once, but scoping them by index, which causes many consecutive call, and much more often (every 10 or so minutes). That's something that can also be improved upon

trinity-1686a avatar Sep 03 '24 13:09 trinity-1686a