Percolate query optimization: Fetch fields mentioned in queries instead of entire doc and batch percolate query by heap-based threshold
Issue #, if available: #1353 #1367
Description of changes:
This PR optimizes scalability of percolate query performed in doc level monitors.
Status quo behaviour
The behaviour is to query and collect data from all shards of each index and perform a percolate query on the docs. Hence the minimum number of queries performed in each execution is equal to number of concrete indices being queried. This was not scaling as the data node executing the doc level would exceed heap memory limits when docs held in memory from all shards in the index exceed memory.
New behaviour introduced in the PR
We introduce a setting plugins.alerting.monitor.percolate_query_docs_size_memory_percentage_limit that allows us to determine the maximum size of the source data in memory allowed before we need to perform a percolate query.
That way the minimum number of percolate queries is 1 irrespective of number of concrete indices being queried and the number of percolate queries performed in a given execution is now more deterministic as it's a function of the heap size of the data node and won't be per concrete index.
CheckList:
- [x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
will explore how I can factor in available free memory at that point in time and we can decide on the acceptable fraction of the heap size and if the data accumulated is crossing that threshold we can submit the percolate query with that many docs. this way we can do for multiple indices also at once as long as it is lesser than threshold and choose executing another percolate query more deterministically
@sbcd90 Should we also add another condition to do batching of percolate query based on num docs and limit to 100k per search at max along with the memory limit.
That way both conditions are evaluated to decide if we want to perform percolate query on current doc set
Closing in favor of smaller PRs