quickwit search nodes unresponsive with expensive aggregation

search nodes unresponsive with expensive aggregation

Open PSeitz opened this issue 1 year ago • 1 comments

As reported:

If I do an aggregation over a large time span, the searchers will end up being killed by kubernetes for missing health checks. I'd assume because the CPU usage is too high and they are mostly un-responsive

Aug 08 '24 09:08 PSeitz

It seems there are two issues

Judging from the logs, the aggregation request is sent 50 times.
The search thread pool takes all CPUs. This may not leave enough resources to answer health checks.

https://github.com/quickwit-oss/quickwit/pull/5304 takes all threads except one for the search thread pool

Aug 14 '24 10:08 PSeitz

quickwit quickwit copied to clipboard

search nodes unresponsive with expensive aggregation

quickwit
quickwit copied to clipboard