OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Improve string terms aggregation performance using Collector#setWeight

Open sandeshkr419 opened this issue 1 year ago • 34 comments

Description

Utilize Collector#setWeight to smartly short-circuit certain aggregation paths. Basically cases when weight#count does not returns -1:

  • when weight#count > 0 & weight#count == maxdocs in segments -> can leverage reading from termsEnum

Cases accounted for (for which the optimization will not work):

  1. Field data not indexed.
  2. Doc count explicitly provided in documents.

Related Issues

Resolves #10954

Check List

  • [x] New functionality includes testing.
    • [x] All tests pass
  • [x] New functionality has been documented.
    • [x] New functionality has javadoc added
  • [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • [x] Commits are signed per the DCO using --signoff
  • [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [x] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sandeshkr419 avatar Dec 19 '23 07:12 sandeshkr419

:x: Gradle check result for 006f404224dbfaea88b5d3eda534a6331fc371d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Dec 19 '23 07:12 github-actions[bot]

:x: Gradle check result for 6667d187c941d3cd52687feeeb9bc2b34ceb59ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 09 '24 19:01 github-actions[bot]

Compatibility status:

Checks if related components are compatible with change cdc4204

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

github-actions[bot] avatar Jan 09 '24 19:01 github-actions[bot]

:x: Gradle check result for ce1082c03b400d2ef891280d2806fc953b91e0ab: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 11 '24 19:01 github-actions[bot]

:x: Gradle check result for a5b3baa7cf69292b00a081cdebf6514fc2c53495: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 24 '24 19:01 github-actions[bot]

:x: Gradle check result for e4e0b3ccc9469fd2556875c999b6ca4fbeab816b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 24 '24 19:01 github-actions[bot]

:x: Gradle check result for 851b7599d4735e910cc9cc37d9ab4eb8176b29a2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 24 '24 22:01 github-actions[bot]

:x: Gradle check result for e005a9c911bde4ab92b38b6749d8ad43a87feb33: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 25 '24 00:01 github-actions[bot]

To test the performance improvements, I had edited one of the sub-aggregation search body to a simple term aggregation, like this.

Basically in OSB, my search body/workload looks like this:

{
      "name": "country_agg_uncached",
      "operation-type": "search",
      "body": {
        "size": 0,
        "aggs": {
          "country_population": {
            "terms": {
              "field": "country_code.raw"
            }
          }
        }
      }
    }

The changes are run on 2.11 cluster:

Without Changes:

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |        2.99 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        2.99 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        2.99 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        2.99 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     126.903 |     ms |
|                                        90th percentile latency | country_agg_uncached |     161.641 |     ms |
|                                        99th percentile latency | country_agg_uncached |     301.439 |     ms |
|                                       100th percentile latency | country_agg_uncached |     317.183 |     ms |
|                                   50th percentile service time | country_agg_uncached |     124.271 |     ms |
|                                   90th percentile service time | country_agg_uncached |     159.256 |     ms |
|                                   99th percentile service time | country_agg_uncached |     298.885 |     ms |
|                                  100th percentile service time | country_agg_uncached |     314.573 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     124.281 |     ms |
|                                        90th percentile latency | country_agg_uncached |     133.398 |     ms |
|                                        99th percentile latency | country_agg_uncached |     146.104 |     ms |
|                                       100th percentile latency | country_agg_uncached |     147.676 |     ms |
|                                   50th percentile service time | country_agg_uncached |     121.773 |     ms |
|                                   90th percentile service time | country_agg_uncached |     131.374 |     ms |
|                                   99th percentile service time | country_agg_uncached |     145.346 |     ms |
|                                  100th percentile service time | country_agg_uncached |     146.158 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |           3 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     121.595 |     ms |
|                                        90th percentile latency | country_agg_uncached |      129.72 |     ms |
|                                        99th percentile latency | country_agg_uncached |     139.045 |     ms |
|                                       100th percentile latency | country_agg_uncached |     143.113 |     ms |
|                                   50th percentile service time | country_agg_uncached |     119.435 |     ms |
|                                   90th percentile service time | country_agg_uncached |     127.876 |     ms |
|                                   99th percentile service time | country_agg_uncached |     136.874 |     ms |
|                                  100th percentile service time | country_agg_uncached |     140.488 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

With Current Changes:

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |           3 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |           3 |  ops/s |
|                                              Median Throughput | country_agg_uncached |           3 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |           3 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     22.5772 |     ms |
|                                        90th percentile latency | country_agg_uncached |     26.6315 |     ms |
|                                        99th percentile latency | country_agg_uncached |     37.5379 |     ms |
|                                       100th percentile latency | country_agg_uncached |     41.2208 |     ms |
|                                   50th percentile service time | country_agg_uncached |     19.9387 |     ms |
|                                   90th percentile service time | country_agg_uncached |     23.2274 |     ms |
|                                   99th percentile service time | country_agg_uncached |     34.8001 |     ms |
|                                  100th percentile service time | country_agg_uncached |     35.6524 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     21.9949 |     ms |
|                                        90th percentile latency | country_agg_uncached |      26.996 |     ms |
|                                        99th percentile latency | country_agg_uncached |     32.5468 |     ms |
|                                       100th percentile latency | country_agg_uncached |     42.8395 |     ms |
|                                   50th percentile service time | country_agg_uncached |     19.5599 |     ms |
|                                   90th percentile service time | country_agg_uncached |     24.0329 |     ms |
|                                   99th percentile service time | country_agg_uncached |     29.9984 |     ms |
|                                  100th percentile service time | country_agg_uncached |     39.9631 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     19.9977 |     ms |
|                                        90th percentile latency | country_agg_uncached |     25.1257 |     ms |
|                                        99th percentile latency | country_agg_uncached |     35.0793 |     ms |
|                                       100th percentile latency | country_agg_uncached |     44.3048 |     ms |
|                                   50th percentile service time | country_agg_uncached |     17.6903 |     ms |
|                                   90th percentile service time | country_agg_uncached |     23.3045 |     ms |
|                                   99th percentile service time | country_agg_uncached |     28.8288 |     ms |
|                                  100th percentile service time | country_agg_uncached |     43.4339 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

|                                                  Segment count |                      |         318 |        |
|                                                 Min Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                Mean Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                              Median Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                                 Max Throughput | country_agg_uncached |        3.01 |  ops/s |
|                                        50th percentile latency | country_agg_uncached |     21.8416 |     ms |
|                                        90th percentile latency | country_agg_uncached |     27.9763 |     ms |
|                                        99th percentile latency | country_agg_uncached |     34.1669 |     ms |
|                                       100th percentile latency | country_agg_uncached |     48.2031 |     ms |
|                                   50th percentile service time | country_agg_uncached |     19.2949 |     ms |
|                                   90th percentile service time | country_agg_uncached |     25.0944 |     ms |
|                                   99th percentile service time | country_agg_uncached |     30.8222 |     ms |
|                                  100th percentile service time | country_agg_uncached |     42.6661 |     ms |
|                                                     error rate | country_agg_uncached |           0 |      % |

Clearly 4x (p100) - 6x (p90) improvements can be seen.

@msfroh I'm working next to see if I can trim in more corners in implementation, refactor further and relevant cases, but please feel free to take initial look and provide comments.

Also, with OSB, I will open up a separate issue with OSB workload to incorporate vanilla term aggregations in their workloads since currently we do not have any term aggregations workload like the one I tested.

sandeshkr419 avatar Jan 29 '24 19:01 sandeshkr419

:x: Gradle check result for 6d057167ea0692a9d19b706bdfc20952a0f8d930: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 31 '24 04:01 github-actions[bot]

:x: Gradle check result for 2a9cce72b4709fbc1815dac5bf06ebb733fd4dd8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 03 '24 00:02 github-actions[bot]

:x: Gradle check result for 82d1532e8b1a903974dba99179a59e2f9f0e103a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 03 '24 00:02 github-actions[bot]

:x: Gradle check result for 58716d246cba929ed4f6a9ed8bca5c09611b0e43: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 03 '24 00:02 github-actions[bot]

:x: Gradle check result for 8922b88703f744a736cc0841f1f02d92b6281697: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 03 '24 00:02 github-actions[bot]

:x: Gradle check result for b5551188d56130cb793c99c8d20803e668ff6046: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 03 '24 01:02 github-actions[bot]

:x: Gradle check result for 234a031d5f0d7f96dffdf0d29e14b02aa6864dae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 05 '24 20:02 github-actions[bot]

:x: Gradle check result for 6f411605582852f1f93c3c9cfe49ade946f2ec23: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 05 '24 23:02 github-actions[bot]

:x: Gradle check result for 1cab5c31e61b0b6666d2cf7fa96573f1b06f9e80: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 07 '24 00:02 github-actions[bot]

:x: Gradle check result for 74f9ac9c9d89f72d843a7b02c579fb96a8464289: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 07 '24 17:02 github-actions[bot]

:x: Gradle check result for 2ee1c0f42e024186fd6f35694c8d61eaab9c3fcd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 08 '24 21:02 github-actions[bot]

:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 08 '24 23:02 github-actions[bot]

:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 09 '24 00:02 github-actions[bot]

:x: Gradle check result for 91ccaef6dfb6c128be19d5551721d7133692c6f4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 09 '24 22:02 github-actions[bot]

:x: Gradle check result for 6fef82e25f34429399ff9c2ba0d61049481b0d81: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 12 '24 23:02 github-actions[bot]

:x: Gradle check result for 478f8a964766dedbcd1aca2d15da6a7b1b9ac202: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 14 '24 20:02 github-actions[bot]

:x: Gradle check result for 14b34e129c51b768e0fdbfb656865d87f18a652c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 14 '24 20:02 github-actions[bot]

:x: Gradle check result for 27d8c149ee3906074fd3a3d0e6303c4f6597d9cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 15 '24 22:02 github-actions[bot]

:white_check_mark: Gradle check result for 2cbca9d64d55f0a6ce833046c9416182ea257063: SUCCESS

github-actions[bot] avatar Feb 15 '24 22:02 github-actions[bot]

Codecov Report

Attention: Patch coverage is 82.97872% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 71.50%. Comparing base (b15cb0c) to head (cdc4204). Report is 14 commits behind head on main.

Files Patch % Lines
...ket/terms/GlobalOrdinalsStringTermsAggregator.java 82.22% 3 Missing and 5 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11643      +/-   ##
============================================
+ Coverage     71.42%   71.50%   +0.08%     
- Complexity    59978    60018      +40     
============================================
  Files          4985     4985              
  Lines        282275   282349      +74     
  Branches      40946    40960      +14     
============================================
+ Hits         201603   201907     +304     
+ Misses        63999    63719     -280     
- Partials      16673    16723      +50     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 15 '24 22:02 codecov[bot]

:x: Gradle check result for 2e2dab0fb579b2630c7b0277572217cf9d442501: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 15 '24 23:02 github-actions[bot]