quickwit Dedicated Root Search Nodes

Is your feature request related to a problem? Please describe

We have a decently large quickwit deployment with a fairly large search volume. Both in terms of amount of data and amount + frequency of search requests issued. As a result our search pool has a lot of disk cache, which can be much more expensive (8TB/node) to scale compared to CPU / RAM. The root search node, the node that receives the initial request and dispatches is also a search node itself. The mixed responsibility can be a little difficult to appropriately allocate resources and scale. We've seen areas of high CPU usage on nodes that we believe are not doing much in terms of executing search queries.

Having dedicates root nodes could also set the stage for being able to search across multiple independent clusters.

Describe the solution you'd like

A way to designate a search node as a root only. These nodes would be responsible only for receiving, parsing, delegating the initial search query request, and aggregating the results from leaf nodes. These nodes should not need a disk based split cache as the work they do wouldn't involve pulling results from splits.

Describe alternatives you've considered

Simply adding more search nodes
Making second deployment in the same cluster with a separate kubernetes service in front of it minus a disk cache. All search requests are sent to this secondary service exclusively. This gets us close, but search requests are still delegated to all nodes, including the ones in the secondary deployment. Search requests are generally slower there because there is no disk cache.

Sep 24 '24 16:09 esatterwhite

+1 This seems to be a scaling bottleneck for Quickwit.

Out of curiosity, how are you currently sizing your Searcher nodes?

Total size in TB of your biggest QW index:
Memory cache params:
- split_footer_cache_capacity
- fast_field_cache_capacity
- partial_request_cache_capacity

Jun 10 '25 19:06 robert-radiant

This seems to be a scaling bottleneck for Quickwit.

How is that a scaling bottleneck?

Jun 11 '25 08:06 fulmicoton

The search will take as long as the slowest node.

So, you either:

Size all the Searchers to be able to also perform the task of the Root searcher optimally, which is wasteful.
Not provide enough resources to the Searchers (to perform the Root Searcher role), which also impacts performance.

Jun 11 '25 16:06 robert-radiant

I see the point of being able to grow the cluster with a couple of nodes without disk, but there are a couple of drawbacks as well:

will the the root search node resources (CPU/RAM) be efficiently used? in other words does the merging of results have a sufficiently high and sustained resource consumption to justify being assigned standalone nodes? This is hard to assess currently as resource consumption cannot be independently estimated between the leaf and root activities.
we should not leave the network resource out of the equation. In the current setup, all nodes have a reasonably well distributed network usage: searcher<->searcher and searcher<->S3. By separating searchers, the network usage would become: root<->leaf and leaf<->s3. Not sure it's a problem in itself, but it definitively adds complexity to the sizing decisions.

Jun 12 '25 13:06 rdettai