quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

search nodes unresponsive with expensive aggregation

Open PSeitz opened this issue 1 year ago • 1 comments

As reported:

If I do an aggregation over a large time span, the searchers will end up being killed by kubernetes for missing health checks. I'd assume because the CPU usage is too high and they are mostly un-responsive

PSeitz avatar Aug 08 '24 09:08 PSeitz

It seems there are two issues

  1. Judging from the logs, the aggregation request is sent 50 times.
  2. The search thread pool takes all CPUs. This may not leave enough resources to answer health checks.

https://github.com/quickwit-oss/quickwit/pull/5304 takes all threads except one for the search thread pool

PSeitz avatar Aug 14 '24 10:08 PSeitz