[BUG] search.max_buckets is not evaluated correctly for terms agg
Describe the bug
The search.max_buckets setting (ref) is used to control the maximum number of aggregation buckets allowed in a single search response.
For terms aggregations the way in which the bucket count is calculated is that sub-aggregation buckets are counted first, and then if their parent bucket is pruned from the candidate list the sub-aggregation bucket count is then subtracted. This means that it is not really accurately counting the number of buckets, see reproduction section below for an example.
More broadly speaking, I'm not sure if this search.max_buckets setting is actually useful. I think the setting can have 2 uses:
- Limit the response size of a given search request -- This isn't quite working correctly as shown by this issue
- Stop bad aggregations from taking up too many resources -- Most aggregation types do not enforce this
max_bucketssetting at the shard level, it's only evaluated duringreduceon the coordinator level which is after a lot of the resource intensive portions of the search request are already completed.
Somewhat related:
- #12916
Related component
Search:Resiliency
To Reproduce
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
The following was done with the noaa opensearch-benchmarks workload but it's not specific to that data.
Set cluster setting:
{
"persistent": {
"search.max_buckets": 2
}
}
This search request does not hit the max buckets limit
{
"size": 0,
"aggs": {
"station": {
"terms": {
"field": "station.id",
"size": 1,
"shard_size": 1
},
"aggs": {
"date": {
"terms": {
"field": "date",
"size": 1,
"shard_size": 1
}
}
}
}
}
}
Neither does this one
{
"size": 0,
"aggs": {
"station": {
"terms": {
"field": "station.id",
"size": 1,
"shard_size": 1
},
"aggs": {
"date": {
"terms": {
"field": "date",
"size": 1,
"shard_size": 2
}
}
}
}
}
}
However, this one does:
{
"size": 0,
"aggs": {
"station": {
"terms": {
"field": "station.id",
"size": 1,
"shard_size": 2
},
"aggs": {
"date": {
"terms": {
"field": "date",
"size": 1,
"shard_size": 1
}
}
}
}
}
}
In all 3 of these cases the response size on the coordinator is only 2 buckets.
Additional Details
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context Add any other context about the problem here.
[Triage - attendees 1 2 3 4 5 6 7] @jed326 Thanks for creating this issue, look forward to a pull request that addresses this topic.
Note; might be worthwhile to create an RFC to remove the field entirely in v3.0
search.max_buckets could be more treated as a circuit breaker construct which limits any bad aggregation query from taking up too many resources, specially memory. I have seen this working in favor of JVM heap utilization on clusters, preventing rogue query from taking the whole node down.
Coverage and accuracy is definitely an issue as pointed by @jed326, especially in case of pruning and should be addressed first.
Renaming this issue to focus on the bug specific to terms aggregations. I think there is still some work aside from that we can do to make the search.max_buckets setting more consistent across aggregation types but that can be a follow-up.