sql [BUG] Sorting rules not applied to bucket aggregations

What is the bug? While messing with aggregation group-by on a dataset with 50000 records, I noticed that it's not processing all the buckets for queries:

> source = nginx_logs
   | stats count() by method, bytes
   | stats count();

fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 1000    |  <--- Should be at least 10804, according to below
+---------+

> source = nginx_logs
   | dedup method, bytes
   | stats count();

fetched rows / total rows = 1/1
+---------+
| count() |
|---------|
| 10804   |
+---------+

The bucket count limitation is understandable (they're memory-intensive and shipping many thousands of buckets would exhaust bandwidth), but it does cause issues when trying to work with sorted aggregations. In this dataset, the top buckets should all be GET, not DELETE.

> source = nginx_logs
   | stats count() by method, bytes
   | sort -`count()`
   | head;

fetched rows / total rows = 10/10
+---------+--------+-------+
| count() | method | bytes |
|---------+--------+-------|
| 13      | DELETE | 81    |
| 12      | DELETE | 74    |
| 10      | DELETE | 71    |
| 10      | DELETE | 95    |
| 10      | DELETE | 51    |
| 10      | DELETE | 88    |
| 9       | DELETE | 78    |
| 9       | DELETE | 53    |
| 9       | DELETE | 49    |
| 9       | DELETE | 55    |
+---------+--------+-------+

This equivalently leads to incorrect top output:

> source = nginx_logs
   | top bytes by method
   | head;
   
fetched rows / total rows = 10/10
+--------+-------+-------+
| method | bytes | count |
|--------+-------+-------|
| DELETE | 81    | 13    |
| DELETE | 74    | 12    |
| DELETE | 51    | 10    |
| DELETE | 71    | 10    |
| DELETE | 88    | 10    |
| DELETE | 95    | 10    |
| DELETE | 44    | 9     |
| DELETE | 49    | 9     |
| DELETE | 52    | 9     |
| DELETE | 53    | 9     |
+--------+-------+-------+

How can one reproduce the bug?

Create a dataset with at least 1000 different buckets according to some sort of grouping
Observe incorrect results when trying to collect/sort them

What is the expected behavior? It seems like the buckets come out lexicographically sorted today, which implies that internally all the buckets are already being computed and sorted (i.e. returning incorrect results isn't a major cost optimization). I wonder if it'd be possible to support sorting in the bucket aggregation, so the 1000 records returned are actually the correct ones.

What is your host/environment? Mainline

Do you have any screenshots? N/A

Do you have any additional context? N/A

Sep 11 '25 16:09 Swiddis

IMO,

The maximum number of output buckets for the stats command is controlled by the search.max_buckets setting.

Sep 11 '25 22:09 penghuo

Would be resolved by #4304

Sep 22 '25 07:09 LantaoJin

> source = nginx_logs
   | stats count() by method, bytes
   | sort -`count()`
   | head 10;

fetched rows / total rows = 10/10
+---------+--------+-------+
| count() | method | bytes |
|---------+--------+-------|
| 13      | DELETE | 81    |
| 12      | DELETE | 74    |
| 10      | DELETE | 71    |
| 10      | DELETE | 95    |
| 10      | DELETE | 51    |
| 10      | DELETE | 88    |
| 9       | DELETE | 78    |
| 9       | DELETE | 53    |
| 9       | DELETE | 49    |
| 9       | DELETE | 55    |
+---------+--------+-------+

The issue resolved by #4603

This equivalently leads to incorrect top output:

> source = nginx_logs
   | top bytes by method
   | head 10;
   
fetched rows / total rows = 10/10
+--------+-------+-------+
| method | bytes | count |
|--------+-------+-------|
| DELETE | 81    | 13    |
| DELETE | 74    | 12    |
| DELETE | 51    | 10    |
| DELETE | 71    | 10    |
| DELETE | 88    | 10    |
| DELETE | 95    | 10    |
| DELETE | 44    | 9     |
| DELETE | 49    | 9     |
| DELETE | 52    | 9     |
| DELETE | 53    | 9     |
+--------+-------+-------+

But | stats count() by method, bytes | sort - count() | head 10; does not equals to | top 10 bytes by method;. The sort - count() | head 10 is a global top 10, but top 10 bytes by method will return the top 10 bytes per method. For example, if there are 4 different method, at most 40 results return.

The results are same that because there is a head 10 after top and current implementation of top is leveraging Window operator over(partitioned by method). The results with a same method are grouped together and coincidental DELETE records in the head of map.entrySet.iterator()

Oct 27 '25 08:10 LantaoJin

Create https://github.com/opensearch-project/sql/issues/4671 to fix the top rare incorrect result issue.

Oct 27 '25 09:10 LantaoJin