alerting icon indicating copy to clipboard operation
alerting copied to clipboard

Percolate query optimization: Fetch fields mentioned in queries instead of entire doc and batch percolate query by heap-based threshold

Open eirsep opened this issue 2 years ago • 3 comments

Issue #, if available: #1353 #1367

Description of changes:

This PR optimizes scalability of percolate query performed in doc level monitors.

Status quo behaviour

The behaviour is to query and collect data from all shards of each index and perform a percolate query on the docs. Hence the minimum number of queries performed in each execution is equal to number of concrete indices being queried. This was not scaling as the data node executing the doc level would exceed heap memory limits when docs held in memory from all shards in the index exceed memory.

New behaviour introduced in the PR

We introduce a setting plugins.alerting.monitor.percolate_query_docs_size_memory_percentage_limit that allows us to determine the maximum size of the source data in memory allowed before we need to perform a percolate query. That way the minimum number of percolate queries is 1 irrespective of number of concrete indices being queried and the number of percolate queries performed in a given execution is now more deterministic as it's a function of the heap size of the data node and won't be per concrete index.

CheckList:

  • [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

eirsep avatar Dec 20 '23 08:12 eirsep

will explore how I can factor in available free memory at that point in time and we can decide on the acceptable fraction of the heap size and if the data accumulated is crossing that threshold we can submit the percolate query with that many docs. this way we can do for multiple indices also at once as long as it is lesser than threshold and choose executing another percolate query more deterministically

eirsep avatar Dec 22 '23 06:12 eirsep

@sbcd90 Should we also add another condition to do batching of percolate query based on num docs and limit to 100k per search at max along with the memory limit.

That way both conditions are evaluated to decide if we want to perform percolate query on current doc set

eirsep avatar Jan 10 '24 20:01 eirsep

Closing in favor of smaller PRs

eirsep avatar Feb 19 '24 10:02 eirsep