OpenSearch
OpenSearch copied to clipboard
Support multi ranges traversal when doing date histogram rewrite optimization
Description
When intersect and visit a BKD tree, the value of the point or the min/max of the inner node are ever-increasing. The ranges rewritten from date histogram are also every-increasing, and connected with each other. So, we can do a single traversal to populate the results for multiple ranges which makes sure that any points or inner node are only visited once. And we can do it in a clever way that only when we meet a new value that not inside current range, we iterate to the next range, and we only need to collect into one range at any point of time. This new value is possibly met when visiting a new node or visiting inside a leaf node.
Note it's hard to cover the traversal logic, it will need to ingest thousands or more documents (every leaf node contains 512 docs).
Benchmark Results
The threshold of doing filter rewrite is set to 70k, so every operation is using the optimization. The JVM memory is 1g r6g.xlarge instance (4vCPU, 32G memory), one node cluster
For each workload, unoptimized results firt, then the opimitzed results
pmc
| Min Throughput | articles_monthly_agg_uncached | 19.95 | ops/s |
| Mean Throughput | articles_monthly_agg_uncached | 19.97 | ops/s |
| Median Throughput | articles_monthly_agg_uncached | 19.98 | ops/s |
| Max Throughput | articles_monthly_agg_uncached | 19.98 | ops/s |
| 50th percentile latency | articles_monthly_agg_uncached | 53.4486 | ms |
| 90th percentile latency | articles_monthly_agg_uncached | 75.4133 | ms |
| 99th percentile latency | articles_monthly_agg_uncached | 90.6223 | ms |
| 100th percentile latency | articles_monthly_agg_uncached | 101.146 | ms |
| 50th percentile service time | articles_monthly_agg_uncached | 46.0403 | ms |
| 90th percentile service time | articles_monthly_agg_uncached | 56.9184 | ms |
| 99th percentile service time | articles_monthly_agg_uncached | 65.0497 | ms |
| 100th percentile service time | articles_monthly_agg_uncached | 65.5977 | ms |
| error rate | articles_monthly_agg_uncached | 0 | % |
| Min Throughput | articles_monthly_agg_cached | 19.97 | ops/s |
| Mean Throughput | articles_monthly_agg_cached | 19.97 | ops/s |
| Median Throughput | articles_monthly_agg_cached | 19.97 | ops/s |
| Max Throughput | articles_monthly_agg_cached | 19.98 | ops/s |
| 50th percentile latency | articles_monthly_agg_cached | 5.03077 | ms |
| 90th percentile latency | articles_monthly_agg_cached | 5.44521 | ms |
| 99th percentile latency | articles_monthly_agg_cached | 5.62429 | ms |
| 100th percentile latency | articles_monthly_agg_cached | 5.69051 | ms |
| 50th percentile service time | articles_monthly_agg_cached | 4.36067 | ms |
| 90th percentile service time | articles_monthly_agg_cached | 4.52949 | ms |
| 99th percentile service time | articles_monthly_agg_cached | 4.73908 | ms |
| 100th percentile service time | articles_monthly_agg_cached | 4.81191 | ms |
| error rate | articles_monthly_agg_cached | 0 | % |
| Min Throughput | articles_monthly_agg_uncached | 20.01 | ops/s |
| Mean Throughput | articles_monthly_agg_uncached | 20.02 | ops/s |
| Median Throughput | articles_monthly_agg_uncached | 20.02 | ops/s |
| Max Throughput | articles_monthly_agg_uncached | 20.02 | ops/s |
| 50th percentile latency | articles_monthly_agg_uncached | 6.93175 | ms |
| 90th percentile latency | articles_monthly_agg_uncached | 7.54458 | ms |
| 99th percentile latency | articles_monthly_agg_uncached | 8.14719 | ms |
| 100th percentile latency | articles_monthly_agg_uncached | 8.63804 | ms |
| 50th percentile service time | articles_monthly_agg_uncached | 6.24524 | ms |
| 90th percentile service time | articles_monthly_agg_uncached | 6.65848 | ms |
| 99th percentile service time | articles_monthly_agg_uncached | 7.61742 | ms |
| 100th percentile service time | articles_monthly_agg_uncached | 7.90734 | ms |
| error rate | articles_monthly_agg_uncached | 0 | % |
| Min Throughput | articles_monthly_agg_cached | 20.01 | ops/s |
| Mean Throughput | articles_monthly_agg_cached | 20.01 | ops/s |
| Median Throughput | articles_monthly_agg_cached | 20.01 | ops/s |
| Max Throughput | articles_monthly_agg_cached | 20.02 | ops/s |
| 50th percentile latency | articles_monthly_agg_cached | 4.51127 | ms |
| 90th percentile latency | articles_monthly_agg_cached | 4.98108 | ms |
| 99th percentile latency | articles_monthly_agg_cached | 5.5638 | ms |
| 100th percentile latency | articles_monthly_agg_cached | 7.23687 | ms |
| 50th percentile service time | articles_monthly_agg_cached | 3.78007 | ms |
| 90th percentile service time | articles_monthly_agg_cached | 4.07006 | ms |
| 99th percentile service time | articles_monthly_agg_cached | 4.55179 | ms |
| 100th percentile service time | articles_monthly_agg_cached | 6.08684 | ms |
| error rate | articles_monthly_agg_cached | 0 | % |
big5
| Min Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| Mean Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| Median Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| Max Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| 50th percentile latency | composite-date_histogram-daily | 11.7901 | ms |
| 90th percentile latency | composite-date_histogram-daily | 12.2404 | ms |
| 99th percentile latency | composite-date_histogram-daily | 12.9297 | ms |
| 100th percentile latency | composite-date_histogram-daily | 13.0813 | ms |
| 50th percentile service time | composite-date_histogram-daily | 10.5722 | ms |
| 90th percentile service time | composite-date_histogram-daily | 10.7189 | ms |
| 99th percentile service time | composite-date_histogram-daily | 11.1253 | ms |
| 100th percentile service time | composite-date_histogram-daily | 11.6602 | ms |
| error rate | composite-date_histogram-daily | 0 | % |
| Min Throughput | date_histogram_hourly_agg | 2 | ops/s |
| Mean Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| Median Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| Max Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| 50th percentile latency | date_histogram_hourly_agg | 157.577 | ms |
| 90th percentile latency | date_histogram_hourly_agg | 159.375 | ms |
| 99th percentile latency | date_histogram_hourly_agg | 163.655 | ms |
| 100th percentile latency | date_histogram_hourly_agg | 178.386 | ms |
| 50th percentile service time | date_histogram_hourly_agg | 156.441 | ms |
| 90th percentile service time | date_histogram_hourly_agg | 158.257 | ms |
| 99th percentile service time | date_histogram_hourly_agg | 162.534 | ms |
| 100th percentile service time | date_histogram_hourly_agg | 177.27 | ms |
| error rate | date_histogram_hourly_agg | 0 | % |
| Min Throughput | date_histogram_minute_agg | 1.27 | ops/s |
| Mean Throughput | date_histogram_minute_agg | 1.27 | ops/s |
| Median Throughput | date_histogram_minute_agg | 1.27 | ops/s |
| Max Throughput | date_histogram_minute_agg | 1.27 | ops/s |
| 50th percentile latency | date_histogram_minute_agg | 72008 | ms |
| 90th percentile latency | date_histogram_minute_agg | 83222.9 | ms |
| 99th percentile latency | date_histogram_minute_agg | 85749.4 | ms |
| 100th percentile latency | date_histogram_minute_agg | 86022.3 | ms |
| 50th percentile service time | date_histogram_minute_agg | 783.015 | ms |
| 90th percentile service time | date_histogram_minute_agg | 786.134 | ms |
| 99th percentile service time | date_histogram_minute_agg | 790.022 | ms |
| 100th percentile service time | date_histogram_minute_agg | 790.282 | ms |
| error rate | date_histogram_minute_agg | 0 | % |
| Min Throughput | composite-date_histogram-daily | 2 | ops/s |
| Mean Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| Median Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| Max Throughput | composite-date_histogram-daily | 2.01 | ops/s |
| 50th percentile latency | composite-date_histogram-daily | 10.2503 | ms |
| 90th percentile latency | composite-date_histogram-daily | 10.7774 | ms |
| 99th percentile latency | composite-date_histogram-daily | 11.0552 | ms |
| 100th percentile latency | composite-date_histogram-daily | 11.1781 | ms |
| 50th percentile service time | composite-date_histogram-daily | 8.9753 | ms |
| 90th percentile service time | composite-date_histogram-daily | 9.26641 | ms |
| 99th percentile service time | composite-date_histogram-daily | 10.086 | ms |
| 100th percentile service time | composite-date_histogram-daily | 10.3218 | ms |
| error rate | composite-date_histogram-daily | 0 | % |
| Min Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| Mean Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| Median Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| Max Throughput | date_histogram_hourly_agg | 2.01 | ops/s |
| 50th percentile latency | date_histogram_hourly_agg | 19.289 | ms |
| 90th percentile latency | date_histogram_hourly_agg | 19.9434 | ms |
| 99th percentile latency | date_histogram_hourly_agg | 21.8219 | ms |
| 100th percentile latency | date_histogram_hourly_agg | 29.2806 | ms |
| 50th percentile service time | date_histogram_hourly_agg | 18.2352 | ms |
| 90th percentile service time | date_histogram_hourly_agg | 18.5031 | ms |
| 99th percentile service time | date_histogram_hourly_agg | 20.5678 | ms |
| 100th percentile service time | date_histogram_hourly_agg | 28.1168 | ms |
| error rate | date_histogram_hourly_agg | 0 | % |
| Min Throughput | date_histogram_minute_agg | 2.01 | ops/s |
| Mean Throughput | date_histogram_minute_agg | 2.01 | ops/s |
| Median Throughput | date_histogram_minute_agg | 2.01 | ops/s |
| Max Throughput | date_histogram_minute_agg | 2.01 | ops/s |
| 50th percentile latency | date_histogram_minute_agg | 71.7014 | ms |
| 90th percentile latency | date_histogram_minute_agg | 72.2578 | ms |
| 99th percentile latency | date_histogram_minute_agg | 72.9293 | ms |
| 100th percentile latency | date_histogram_minute_agg | 94.9371 | ms |
| 50th percentile service time | date_histogram_minute_agg | 70.4787 | ms |
| 90th percentile service time | date_histogram_minute_agg | 70.908 | ms |
| 99th percentile service time | date_histogram_minute_agg | 71.8165 | ms |
| 100th percentile service time | date_histogram_minute_agg | 94.2959 | ms |
| error rate | date_histogram_minute_agg | 0 | % |
nyc_taxis
| Min Throughput | autohisto_agg | 1.51 | ops/s |
| Mean Throughput | autohisto_agg | 1.51 | ops/s |
| Median Throughput | autohisto_agg | 1.51 | ops/s |
| Max Throughput | autohisto_agg | 1.52 | ops/s |
| 50th percentile latency | autohisto_agg | 25.4275 | ms |
| 90th percentile latency | autohisto_agg | 26.1543 | ms |
| 99th percentile latency | autohisto_agg | 26.9703 | ms |
| 100th percentile latency | autohisto_agg | 27.9551 | ms |
| 50th percentile service time | autohisto_agg | 24.0695 | ms |
| 90th percentile service time | autohisto_agg | 24.6367 | ms |
| 99th percentile service time | autohisto_agg | 25.8091 | ms |
| 100th percentile service time | autohisto_agg | 26.3474 | ms |
| error rate | autohisto_agg | 0 | % |
| Min Throughput | date_histogram_agg | 1.51 | ops/s |
| Mean Throughput | date_histogram_agg | 1.52 | ops/s |
| Median Throughput | date_histogram_agg | 1.51 | ops/s |
| Max Throughput | date_histogram_agg | 1.53 | ops/s |
| 50th percentile latency | date_histogram_agg | 24.7382 | ms |
| 90th percentile latency | date_histogram_agg | 25.1863 | ms |
| 99th percentile latency | date_histogram_agg | 26.5017 | ms |
| 100th percentile latency | date_histogram_agg | 28.9566 | ms |
| 50th percentile service time | date_histogram_agg | 23.325 | ms |
| 90th percentile service time | date_histogram_agg | 23.514 | ms |
| 99th percentile service time | date_histogram_agg | 25.2249 | ms |
| 100th percentile service time | date_histogram_agg | 27.3667 | ms |
| error rate | date_histogram_agg | 0 | % |
| Min Throughput | autohisto_agg | 1.51 | ops/s |
| Mean Throughput | autohisto_agg | 1.52 | ops/s |
| Median Throughput | autohisto_agg | 1.51 | ops/s |
| Max Throughput | autohisto_agg | 1.53 | ops/s |
| 50th percentile latency | autohisto_agg | 13.9074 | ms |
| 90th percentile latency | autohisto_agg | 14.2888 | ms |
| 99th percentile latency | autohisto_agg | 15.2199 | ms |
| 100th percentile latency | autohisto_agg | 15.9146 | ms |
| 50th percentile service time | autohisto_agg | 12.4878 | ms |
| 90th percentile service time | autohisto_agg | 12.7325 | ms |
| 99th percentile service time | autohisto_agg | 13.6324 | ms |
| 100th percentile service time | autohisto_agg | 14.7879 | ms |
| error rate | autohisto_agg | 0 | % |
| Min Throughput | date_histogram_agg | 1.51 | ops/s |
| Mean Throughput | date_histogram_agg | 1.52 | ops/s |
| Median Throughput | date_histogram_agg | 1.51 | ops/s |
| Max Throughput | date_histogram_agg | 1.53 | ops/s |
| 50th percentile latency | date_histogram_agg | 14.2494 | ms |
| 90th percentile latency | date_histogram_agg | 14.6327 | ms |
| 99th percentile latency | date_histogram_agg | 14.8823 | ms |
| 100th percentile latency | date_histogram_agg | 16.1273 | ms |
| 50th percentile service time | date_histogram_agg | 12.9227 | ms |
| 90th percentile service time | date_histogram_agg | 13.1052 | ms |
| 99th percentile service time | date_histogram_agg | 13.4257 | ms |
| 100th percentile service time | date_histogram_agg | 14.2286 | ms |
| error rate | date_histogram_agg | 0 | % |
| Min Throughput | hourly_agg | 0.2 | ops/s |
| Mean Throughput | hourly_agg | 0.2 | ops/s |
| Median Throughput | hourly_agg | 0.2 | ops/s |
| Max Throughput | hourly_agg | 0.2 | ops/s |
| 50th percentile latency | hourly_agg | 283.503 | ms |
| 90th percentile latency | hourly_agg | 322.241 | ms |
| 99th percentile latency | hourly_agg | 342.808 | ms |
| 100th percentile latency | hourly_agg | 352.563 | ms |
| 50th percentile service time | hourly_agg | 280.674 | ms |
| 90th percentile service time | hourly_agg | 318.865 | ms |
| 99th percentile service time | hourly_agg | 336.963 | ms |
| 100th percentile service time | hourly_agg | 346.66 | ms |
| error rate | hourly_agg | 0 | % |
| Min Throughput | hourly_agg | 0.2 | ops/s |
| Mean Throughput | hourly_agg | 0.2 | ops/s |
| Median Throughput | hourly_agg | 0.2 | ops/s |
| Max Throughput | hourly_agg | 0.2 | ops/s |
| 50th percentile latency | hourly_agg | 41.3073 | ms |
| 90th percentile latency | hourly_agg | 44.4373 | ms |
| 99th percentile latency | hourly_agg | 48.0578 | ms |
| 100th percentile latency | hourly_agg | 53.0044 | ms |
| 50th percentile service time | hourly_agg | 37.6982 | ms |
| 90th percentile service time | hourly_agg | 41.1426 | ms |
| 99th percentile service time | hourly_agg | 44.6609 | ms |
| 100th percentile service time | hourly_agg | 46.2757 | ms |
| error rate | hourly_agg | 0 | % |
Related Issues
Resolves #13171 #13345
Check List
- [x] New functionality includes testing.
- [x] All tests pass
- [x] New functionality has been documented.
- [x] New functionality has javadoc added
- [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [x] Commits are signed per the DCO using --signoff
- [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- ~[ ] Public documentation issue/PR created~
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
:x: Gradle check result for c4d9055f16dc68bd0000ccd1004794838c77b9fc: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 02f5c14b8b4f056c1ded235ead55f5b0ebf43447: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 76dcb65926e992ac9741ad5a180c6366c03a174c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 76cc19364d8cc1453ea5ee26ff31468706dca2cc: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaa741405be495eb128f0b9d08179dd82fad43a2: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaa741405be495eb128f0b9d08179dd82fad43a2:
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for aaa741405be495eb128f0b9d08179dd82fad43a2: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 1ee1070ee80d13d944a2ce35a0218c49b4b3b87b: SUCCESS
Codecov Report
Attention: Patch coverage is 81.12245% with 37 lines in your changes are missing coverage. Please review.
Project coverage is 71.55%. Comparing base (
b15cb0c) to head (d739534). Report is 258 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #13317 +/- ##
============================================
+ Coverage 71.42% 71.55% +0.13%
- Complexity 59978 60991 +1013
============================================
Files 4985 5050 +65
Lines 282275 286987 +4712
Branches 40946 41591 +645
============================================
+ Hits 201603 205347 +3744
- Misses 63999 64618 +619
- Partials 16673 17022 +349
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:x: Gradle check result for ba7c54939190f69d0b35f52fca7a75da9819698b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 50126bee6f8487ce75b8c5d013549a7c69fcf2ea: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 50126bee6f8487ce75b8c5d013549a7c69fcf2ea: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for 59a80224f19cfc4ef48af0d96afc09f88e532a07: UNSTABLE
- TEST FAILURES:
1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
:x: Gradle check result for 43669259366d2a36463e5941581e707d8ee95522: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 600562cd796f2743436a91f30cec99fb3403b8ca: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 600562c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
#12788
:white_check_mark: Gradle check result for 59888db58e061af4345ce1552cb7ebd7d21f8275: SUCCESS
:grey_exclamation: Gradle check result for 176e1d15944d2888a2cad47b6d7b2a06825bd09a: UNSTABLE
- TEST FAILURES:
1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
:white_check_mark: Gradle check result for e13cd4b0ec6c9403eb7481a69e2bfda3116838a0: SUCCESS
@bowenlan-amzn could you also post the benchmark results of 2.11, where we didn't have any of these optimizations. Thanks
@bowenlan-amzn could you also post the benchmark results of 2.11, where we didn't have any of these optimizations.
+1. Will be good to look at the benchmark numbers before merging this in. Also, we can increase the default value of buckets for filter rewrite aggregation optimization for benefitting wider range of use cases
Also, I notice that the codecov/patch validation is failing due to low coverage. Can we cover few more use cases to increase the coverage to 75/80%?
If I'm not mistaken, we should not apply this optimization for FunctionScore query even if its sub query is eligible for optimization. Functions of function score query relies on docs returned by the subquery to apply customized scoring functions to them.
I was not able to comment on the line in the diff itself as it belongs to a previous PR but here is the line I'm talking about - https://github.com/opensearch-project/OpenSearch/blob/1f406dbe5935227b9aa2877ed4b8932cde1e8821/server/src/main/java/org/opensearch/search/aggregations/bucket/FastFilterRewriteHelper.java#L72
Thinking about it more, since we are just terminating the leaf bucket collector early, so it should have any impact on other parts of the query other than this collector. So it should be safe to apply to FunctionScore queries too. However, this optimization may not be of much help in these queries since FunctionScore query will anyways iterate over all matching documents.
With my understanding, I highly doubt the current logic would work for following cases -
-
Index sort is applied in reverse order? It doesn't seems like index sort will impact BKD - https://github.com/apache/lucene/blob/b0ebb849f5b5432473f51d16275d4842a30a30b2/lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java#L748 and it just sorts the doc values. Please validate if this is correct
-
~~
RangeCollectorForPointTreeonly processes one range at a time and in an ordered way as once iterator is moved to next range, it only processes ranges following it. When a full node is is a match, we are assuming it will only match one range, which may or may not be true, and treat all points under it as part of the current range set in the iterator.~~
private void countNode(int count) {
counter += count;
}
I think I overlooked your comparator, it seems correct.
Also, this line doesn't have any code coveage, so I doubt we have checked against such cases. Please add unit test for it.
~~Here is what I suggest - Check against all the ranges which could match instead of just processing current range set against the iterator in RangeCollectorForPointTree. This would require you go down the tree as you have to check values contained in the node against all ranges which could possibly match, so a performance hit. Please re-run benchmark if we agree upon this approach.~~
@bowenlan-amzn I updated my comment inline about point 2.
@bowenlan-amzn Is there any benefit of applying this optimization if while processing first range the leaf node needs to be visited i.e. range is smaller than number of points we have in a leaf node OR min, max of a leaf node covering the full range. Because if that's the case, all subsequent ranges will also be visiting the leaf and this logic will end up visiting the all the points in the tree. Won't it cause regression when compared to doc values approach, where we are processing docIDs in an ordered way? This can happen when the ranges are small and segment is huge.
Maybe we can check number of leaf nodes? in balanced tree it would bee tree size/2. And if this number is less than number of buckets/ranges then don't apply the optimization?
@Rishabh Maurya commented on May 1, 2024, 10:15 AM PDT:
@bowenlan-amzn Is there any benefit of applying this optimization if while processing first range the leaf node needs to be visited i.e. range is smaller than number of points we have in a leaf node OR min, max of a leaf node covering the full range. Because if that's the case, all subsequent ranges will also be visiting the leaf and this logic will end up visiting the all the points in the tree. Won't it cause regression when compared to doc values approach, where we are processing docIDs in an ordered way? This can happen when the ranges are small and segment is huge.
Maybe we can check number of leaf nodes? in balanced tree it would bee tree size/2. And if this number is less than number of buckets/ranges then don't apply the optimization?
Originally posted by @rishabhmaurya in https://github.com/opensearch-project/OpenSearch/issues/13317#issuecomment-2088789777
Makes sense. I haven't gotton the accurate quantative comparison between doc value and multi range traversal way of doing the date histogram. It's true that more ranges mean higher upper bound of leaf nodes to visit. However, for the pmc workload, the month agg with 123 ranges, with the multi-range change, it becomes better than the default doc value traversal. (lemme get some results to here later) So it's possible that even all nodes visited by multi-range are leaf nodes, this is still faster than default doc value method.
:x: Gradle check result for 24fac6b1ee4a97f94016cfb367fc6b400ab26c19: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for b95781ab85742c27d996818371d516921965a97f: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
For each workload, old optimized results firt, then the new opimitzed results with multi range traversal, last one if exists would be the optimization disabled results which falls back to default aggregation workflow.
Overall to summarize, we are able to observe further improvement for date histogram aggregations without regressing/improving the pmc workload compared to original?
Also, can/have you ensure/d the correctness of results from these aggregation queries? Might be okay to spot check, since we have limited number of queries. Longer term, we need to have framework for validating that as we make more complex changes in core like this.