OpenSearch
OpenSearch copied to clipboard
Improve string terms aggregation performance using Collector#setWeight
Description
Utilize Collector#setWeight to smartly short-circuit certain aggregation paths. Basically cases when weight#count does not returns -1
:
- when
weight#count > 0
&weight#count == maxdocs
in segments -> can leverage reading from termsEnum
Cases accounted for (for which the optimization will not work):
- Field data not indexed.
- Doc count explicitly provided in documents.
Related Issues
Resolves #10954
Check List
- [x] New functionality includes testing.
- [x] All tests pass
- [x] New functionality has been documented.
- [x] New functionality has javadoc added
- [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [x] Commits are signed per the DCO using --signoff
- [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- [x] Public documentation issue/PR created
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
:x: Gradle check result for 006f404224dbfaea88b5d3eda534a6331fc371d0: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 6667d187c941d3cd52687feeeb9bc2b34ceb59ae: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Compatibility status:
Checks if related components are compatible with change cdc4204
Incompatible components
Skipped components
Compatible components
Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]
:x: Gradle check result for ce1082c03b400d2ef891280d2806fc953b91e0ab: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for a5b3baa7cf69292b00a081cdebf6514fc2c53495: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e4e0b3ccc9469fd2556875c999b6ca4fbeab816b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 851b7599d4735e910cc9cc37d9ab4eb8176b29a2: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e005a9c911bde4ab92b38b6749d8ad43a87feb33: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
To test the performance improvements, I had edited one of the sub-aggregation search body to a simple term aggregation, like this.
Basically in OSB, my search body/workload looks like this:
{
"name": "country_agg_uncached",
"operation-type": "search",
"body": {
"size": 0,
"aggs": {
"country_population": {
"terms": {
"field": "country_code.raw"
}
}
}
}
}
The changes are run on 2.11 cluster:
Without Changes:
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 2.99 | ops/s |
| Mean Throughput | country_agg_uncached | 2.99 | ops/s |
| Median Throughput | country_agg_uncached | 2.99 | ops/s |
| Max Throughput | country_agg_uncached | 2.99 | ops/s |
| 50th percentile latency | country_agg_uncached | 126.903 | ms |
| 90th percentile latency | country_agg_uncached | 161.641 | ms |
| 99th percentile latency | country_agg_uncached | 301.439 | ms |
| 100th percentile latency | country_agg_uncached | 317.183 | ms |
| 50th percentile service time | country_agg_uncached | 124.271 | ms |
| 90th percentile service time | country_agg_uncached | 159.256 | ms |
| 99th percentile service time | country_agg_uncached | 298.885 | ms |
| 100th percentile service time | country_agg_uncached | 314.573 | ms |
| error rate | country_agg_uncached | 0 | % |
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3.01 | ops/s |
| Mean Throughput | country_agg_uncached | 3.01 | ops/s |
| Median Throughput | country_agg_uncached | 3.01 | ops/s |
| Max Throughput | country_agg_uncached | 3.01 | ops/s |
| 50th percentile latency | country_agg_uncached | 124.281 | ms |
| 90th percentile latency | country_agg_uncached | 133.398 | ms |
| 99th percentile latency | country_agg_uncached | 146.104 | ms |
| 100th percentile latency | country_agg_uncached | 147.676 | ms |
| 50th percentile service time | country_agg_uncached | 121.773 | ms |
| 90th percentile service time | country_agg_uncached | 131.374 | ms |
| 99th percentile service time | country_agg_uncached | 145.346 | ms |
| 100th percentile service time | country_agg_uncached | 146.158 | ms |
| error rate | country_agg_uncached | 0 | % |
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3 | ops/s |
| Mean Throughput | country_agg_uncached | 3.01 | ops/s |
| Median Throughput | country_agg_uncached | 3.01 | ops/s |
| Max Throughput | country_agg_uncached | 3.01 | ops/s |
| 50th percentile latency | country_agg_uncached | 121.595 | ms |
| 90th percentile latency | country_agg_uncached | 129.72 | ms |
| 99th percentile latency | country_agg_uncached | 139.045 | ms |
| 100th percentile latency | country_agg_uncached | 143.113 | ms |
| 50th percentile service time | country_agg_uncached | 119.435 | ms |
| 90th percentile service time | country_agg_uncached | 127.876 | ms |
| 99th percentile service time | country_agg_uncached | 136.874 | ms |
| 100th percentile service time | country_agg_uncached | 140.488 | ms |
| error rate | country_agg_uncached | 0 | % |
With Current Changes:
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3 | ops/s |
| Mean Throughput | country_agg_uncached | 3 | ops/s |
| Median Throughput | country_agg_uncached | 3 | ops/s |
| Max Throughput | country_agg_uncached | 3 | ops/s |
| 50th percentile latency | country_agg_uncached | 22.5772 | ms |
| 90th percentile latency | country_agg_uncached | 26.6315 | ms |
| 99th percentile latency | country_agg_uncached | 37.5379 | ms |
| 100th percentile latency | country_agg_uncached | 41.2208 | ms |
| 50th percentile service time | country_agg_uncached | 19.9387 | ms |
| 90th percentile service time | country_agg_uncached | 23.2274 | ms |
| 99th percentile service time | country_agg_uncached | 34.8001 | ms |
| 100th percentile service time | country_agg_uncached | 35.6524 | ms |
| error rate | country_agg_uncached | 0 | % |
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3.01 | ops/s |
| Mean Throughput | country_agg_uncached | 3.01 | ops/s |
| Median Throughput | country_agg_uncached | 3.01 | ops/s |
| Max Throughput | country_agg_uncached | 3.01 | ops/s |
| 50th percentile latency | country_agg_uncached | 21.9949 | ms |
| 90th percentile latency | country_agg_uncached | 26.996 | ms |
| 99th percentile latency | country_agg_uncached | 32.5468 | ms |
| 100th percentile latency | country_agg_uncached | 42.8395 | ms |
| 50th percentile service time | country_agg_uncached | 19.5599 | ms |
| 90th percentile service time | country_agg_uncached | 24.0329 | ms |
| 99th percentile service time | country_agg_uncached | 29.9984 | ms |
| 100th percentile service time | country_agg_uncached | 39.9631 | ms |
| error rate | country_agg_uncached | 0 | % |
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3.01 | ops/s |
| Mean Throughput | country_agg_uncached | 3.01 | ops/s |
| Median Throughput | country_agg_uncached | 3.01 | ops/s |
| Max Throughput | country_agg_uncached | 3.01 | ops/s |
| 50th percentile latency | country_agg_uncached | 19.9977 | ms |
| 90th percentile latency | country_agg_uncached | 25.1257 | ms |
| 99th percentile latency | country_agg_uncached | 35.0793 | ms |
| 100th percentile latency | country_agg_uncached | 44.3048 | ms |
| 50th percentile service time | country_agg_uncached | 17.6903 | ms |
| 90th percentile service time | country_agg_uncached | 23.3045 | ms |
| 99th percentile service time | country_agg_uncached | 28.8288 | ms |
| 100th percentile service time | country_agg_uncached | 43.4339 | ms |
| error rate | country_agg_uncached | 0 | % |
| Segment count | | 318 | |
| Min Throughput | country_agg_uncached | 3.01 | ops/s |
| Mean Throughput | country_agg_uncached | 3.01 | ops/s |
| Median Throughput | country_agg_uncached | 3.01 | ops/s |
| Max Throughput | country_agg_uncached | 3.01 | ops/s |
| 50th percentile latency | country_agg_uncached | 21.8416 | ms |
| 90th percentile latency | country_agg_uncached | 27.9763 | ms |
| 99th percentile latency | country_agg_uncached | 34.1669 | ms |
| 100th percentile latency | country_agg_uncached | 48.2031 | ms |
| 50th percentile service time | country_agg_uncached | 19.2949 | ms |
| 90th percentile service time | country_agg_uncached | 25.0944 | ms |
| 99th percentile service time | country_agg_uncached | 30.8222 | ms |
| 100th percentile service time | country_agg_uncached | 42.6661 | ms |
| error rate | country_agg_uncached | 0 | % |
Clearly 4x (p100) - 6x (p90) improvements can be seen.
@msfroh I'm working next to see if I can trim in more corners in implementation, refactor further and relevant cases, but please feel free to take initial look and provide comments.
Also, with OSB, I will open up a separate issue with OSB workload to incorporate vanilla term aggregations in their workloads since currently we do not have any term aggregations workload like the one I tested.
:x: Gradle check result for 6d057167ea0692a9d19b706bdfc20952a0f8d930: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 2a9cce72b4709fbc1815dac5bf06ebb733fd4dd8: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 82d1532e8b1a903974dba99179a59e2f9f0e103a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 58716d246cba929ed4f6a9ed8bca5c09611b0e43: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 8922b88703f744a736cc0841f1f02d92b6281697: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for b5551188d56130cb793c99c8d20803e668ff6046: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 234a031d5f0d7f96dffdf0d29e14b02aa6864dae: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 6f411605582852f1f93c3c9cfe49ade946f2ec23: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 1cab5c31e61b0b6666d2cf7fa96573f1b06f9e80: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 74f9ac9c9d89f72d843a7b02c579fb96a8464289: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 2ee1c0f42e024186fd6f35694c8d61eaab9c3fcd: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 6fef82e25f34429399ff9c2ba0d61049481b0d81: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 478f8a964766dedbcd1aca2d15da6a7b1b9ac202: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 14b34e129c51b768e0fdbfb656865d87f18a652c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 27d8c149ee3906074fd3a3d0e6303c4f6597d9cd: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 2cbca9d64d55f0a6ce833046c9416182ea257063: SUCCESS
Codecov Report
Attention: Patch coverage is 82.97872%
with 8 lines
in your changes are missing coverage. Please review.
Project coverage is 71.50%. Comparing base (
b15cb0c
) to head (cdc4204
). Report is 14 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
...ket/terms/GlobalOrdinalsStringTermsAggregator.java | 82.22% | 3 Missing and 5 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #11643 +/- ##
============================================
+ Coverage 71.42% 71.50% +0.08%
- Complexity 59978 60018 +40
============================================
Files 4985 4985
Lines 282275 282349 +74
Branches 40946 40960 +14
============================================
+ Hits 201603 201907 +304
+ Misses 63999 63719 -280
- Partials 16673 16723 +50
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:x: Gradle check result for 2e2dab0fb579b2630c7b0277572217cf9d442501: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?