OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Refactoring FilterPath.parse by using an iterative approach instead of recursion.

Open deshsidd opened this issue 1 year ago • 16 comments

Description

Refactoring FilterPath.parse by using an iterative approach instead of recursion.

Based on the following PR: https://github.com/opensearch-project/OpenSearch/pull/12131

Related Issues

Resolves #12067

Check List

  • [x] Functionality includes testing.
  • [x] API changes companion pull request created, if applicable.
  • [x] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

deshsidd avatar Jun 11 '24 22:06 deshsidd

:x: Gradle check result for 8ee3459c5eebff98439758b51bcb2628f7d390a8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 23:06 github-actions[bot]

:grey_exclamation: Gradle check result for e58566f90edd6ecd6b7a63dd89a6050aa7c38f57: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jun 11 '24 23:06 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.70%. Comparing base (f1f4f89) to head (e300693).

Additional details and impacted files
@@            Coverage Diff            @@
##               main   #14200   +/-   ##
=========================================
  Coverage     71.69%   71.70%           
- Complexity    62292    62319   +27     
=========================================
  Files          5140     5140           
  Lines        293020   293015    -5     
  Branches      42347    42345    -2     
=========================================
+ Hits         210090   210108   +18     
- Misses        65614    65622    +8     
+ Partials      17316    17285   -31     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 11 '24 23:06 codecov[bot]

:grey_exclamation: Gradle check result for 87e181597cecd027ef521fd16f59148e746cb917: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jun 12 '24 18:06 github-actions[bot]

Is this actually faster? Got any micro benchmarks?

Did not do any benchmarking and this is based on https://github.com/opensearch-project/OpenSearch/pull/12131 that I am helping merge.

deshsidd avatar Jun 12 '24 18:06 deshsidd

:x: Gradle check result for b4963d9d79ddd216908fe62bd4c0b29195cd82cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 12 '24 18:06 github-actions[bot]

:x: Gradle check result for 8c0b49ca403bf8e97dfa3add121a8976e60ef477: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 12 '24 20:06 github-actions[bot]

:x: Gradle check result for 5d0fc3296b3a76f560cc2f7aca90f296e5ee264c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 12 '24 20:06 github-actions[bot]

:white_check_mark: Gradle check result for 46bff453131418a68a5f99c90fb72fc5cf312372: SUCCESS

github-actions[bot] avatar Jun 12 '24 22:06 github-actions[bot]

Is this actually faster? Got any micro benchmarks?

Additional reference related to performance: https://github.com/opensearch-project/OpenSearch/issues/12102

jed326 avatar Jun 12 '24 23:06 jed326

Is this actually faster? Got any micro benchmarks?

Did not do any benchmarking and this is based on #12131 that I am helping merge.

I understand that this does fix a potential stack overflow which was observed in https://github.com/opensearch-project/OpenSearch/issues/12102, but I think the regex parsing can also have the opposite effect on performance, I think we need to know what the impact of this change is before we merge it. Can we please get a quick microbenchmark @deshsidd ?

dblock avatar Jun 13 '24 14:06 dblock

I think we need to know what the impact of this change is before we merge it. Can we please get a quick microbenchmark @deshsidd ?

Agreed. It should be easy to write a quick microbenchmark and get some data. Here's the README if you haven't done one of these before.

andrross avatar Jun 14 '24 20:06 andrross

@dblock

I understand that this does fix a potential stack overflow which was observed in https://github.com/opensearch-project/OpenSearch/issues/12102, but I think the regex parsing can also have the opposite effect on performance, I think we need to know what the impact of this change is before we merge it.

I was reviewing this PR - I don't think we are adding any additional regex parsing than what already exists in the code.

sandeshkr419 avatar Jun 18 '24 10:06 sandeshkr419

@sandeshkr419 there's a filter.split("(?<!\\\\)\\.");, but I agree it's not in a tight loop. Still, if we're going to claim a performance improvement we should demonstrate it's one.

dblock avatar Jun 18 '24 16:06 dblock

Ran some benchmarks by creating a cluster with my local changes and using opensearch benchmarks:

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            [0m
|                                                         Metric |                     Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |                          |     1.57577 |    min |
|             Min cumulative indexing time across primary shards |                          |    0.506317 |    min |
|          Median cumulative indexing time across primary shards |                          |    0.515183 |    min |
|             Max cumulative indexing time across primary shards |                          |    0.554267 |    min |
|            Cumulative indexing throttle time of primary shards |                          |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |                          |           0 |    min |
| Median cumulative indexing throttle time across primary shards |                          |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |                          |           0 |    min |
|                        Cumulative merge time of primary shards |                          |   0.0539667 |    min |
|                       Cumulative merge count of primary shards |                          |           4 |        |
|                Min cumulative merge time across primary shards |                          |     0.01275 |    min |
|             Median cumulative merge time across primary shards |                          |     0.01605 |    min |
|                Max cumulative merge time across primary shards |                          |   0.0251667 |    min |
|               Cumulative merge throttle time of primary shards |                          |           0 |    min |
|       Min cumulative merge throttle time across primary shards |                          |           0 |    min |
|    Median cumulative merge throttle time across primary shards |                          |           0 |    min |
|       Max cumulative merge throttle time across primary shards |                          |           0 |    min |
|                      Cumulative refresh time of primary shards |                          |    0.171717 |    min |
|                     Cumulative refresh count of primary shards |                          |          40 |        |
|              Min cumulative refresh time across primary shards |                          |     0.05055 |    min |
|           Median cumulative refresh time across primary shards |                          |   0.0515667 |    min |
|              Max cumulative refresh time across primary shards |                          |      0.0696 |    min |
|                        Cumulative flush time of primary shards |                          |           0 |    min |
|                       Cumulative flush count of primary shards |                          |           0 |        |
|                Min cumulative flush time across primary shards |                          |           0 |    min |
|             Median cumulative flush time across primary shards |                          |           0 |    min |
|                Max cumulative flush time across primary shards |                          |           0 |    min |
|                                        Total Young Gen GC time |                          |       0.343 |      s |
|                                       Total Young Gen GC count |                          |           8 |        |
|                                          Total Old Gen GC time |                          |           0 |      s |
|                                         Total Old Gen GC count |                          |           0 |        |
|                                                     Store size |                          |    0.113262 |     GB |
|                                                  Translog size |                          | 1.53668e-07 |     GB |
|                                         Heap used for segments |                          |           0 |     MB |
|                                       Heap used for doc values |                          |           0 |     MB |
|                                            Heap used for terms |                          |           0 |     MB |
|                                            Heap used for norms |                          |           0 |     MB |
|                                           Heap used for points |                          |           0 |     MB |
|                                    Heap used for stored fields |                          |           0 |     MB |
|                                                  Segment count |                          |          25 |        |
|                                                 Min Throughput |                    index |     28452.4 | docs/s |
|                                                Mean Throughput |                    index |     46063.7 | docs/s |
|                                              Median Throughput |                    index |     45448.6 | docs/s |
|                                                 Max Throughput |                    index |     62262.2 | docs/s |
|                                        50th percentile latency |                    index |      373.84 |     ms |
|                                        90th percentile latency |                    index |     593.865 |     ms |
|                                        99th percentile latency |                    index |     784.339 |     ms |
|                                       100th percentile latency |                    index |     1145.18 |     ms |
|                                   50th percentile service time |                    index |      373.84 |     ms |
|                                   90th percentile service time |                    index |     593.865 |     ms |
|                                   99th percentile service time |                    index |     784.339 |     ms |
|                                  100th percentile service time |                    index |     1145.18 |     ms |
|                                                     error rate |                    index |           0 |      % |
|                                                 Min Throughput | wait-until-merges-finish |         7.4 |  ops/s |
|                                                Mean Throughput | wait-until-merges-finish |         7.4 |  ops/s |
|                                              Median Throughput | wait-until-merges-finish |         7.4 |  ops/s |
|                                                 Max Throughput | wait-until-merges-finish |         7.4 |  ops/s |
|                                       100th percentile latency | wait-until-merges-finish |     134.712 |     ms |
|                                  100th percentile service time | wait-until-merges-finish |     134.712 |     ms |
|                                                     error rate | wait-until-merges-finish |           0 |      % |


deshsidd avatar Jun 28 '24 19:06 deshsidd

Thanks @deshsidd - I am uncertain that we can illustrate performance gains with the benchmark posted above. We might have to use a search/query which can utilize the methods being refactored and then just run (and average the time taken) it for n times (against before and after your changes) to capture how much does a query performance utilizing the refactored methods have improved.

sandeshkr419 avatar Jun 29 '24 06:06 sandeshkr419

Ran a manual benchmark on my VM to add 100000 documents to an index and run 1000 filter_path queries on the index. I then calculated the total time taken for all queries. I also increased the http.max_initial_line_length from 4 to 100 to avoid too_long_http_line_exception exception.

Total time taken for all queries with filter_path changes : 41.506942611 seconds Total time taken for all queries without filter_path changes : N/A

Got the following for every invocation without the filter_path changes:

"type": "pattern_syntax_exception",
"reason": "Stack overflow during pattern compilation\n\\\\."

deshsidd avatar Jul 01 '24 21:07 deshsidd

@deshsidd - Thank you for sharing these results. Can you also share the query, just curious about the query shape (wink)?

@dblock - This looks sufficient for merging the change IMO.

jainankitk avatar Jul 02 '24 00:07 jainankitk

Thanks @jainankitk @deshsidd @sandeshkr419, let's merge.

dblock avatar Jul 02 '24 17:07 dblock

@deshsidd - Thank you for sharing these results. Can you also share the query, just curious about the query shape (wink)?

filter_path=$(printf 'h.%0.s' {1..20000})
curl -s "$OPENSEARCH_URL/$INDEX_NAME/_search?pretty&filter_path=$filter_path" | jq '.'

deshsidd avatar Jul 02 '24 18:07 deshsidd

:grey_exclamation: Gradle check result for 4405a75fe433f64b2305eb3abae0e5a736a862f5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jul 02 '24 19:07 github-actions[bot]

:white_check_mark: Gradle check result for e300693f20c5cbe9b68cd193e4918c6981c93a00: SUCCESS

github-actions[bot] avatar Jul 02 '24 19:07 github-actions[bot]