OpenSearch
OpenSearch copied to clipboard
Refactoring FilterPath.parse by using an iterative approach instead of recursion.
Description
Refactoring FilterPath.parse by using an iterative approach instead of recursion.
Based on the following PR: https://github.com/opensearch-project/OpenSearch/pull/12131
Related Issues
Resolves #12067
Check List
- [x] Functionality includes testing.
- [x] API changes companion pull request created, if applicable.
- [x] Public documentation issue/PR created, if applicable.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
:x: Gradle check result for 8ee3459c5eebff98439758b51bcb2628f7d390a8: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for e58566f90edd6ecd6b7a63dd89a6050aa7c38f57: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 71.70%. Comparing base (
f1f4f89) to head (e300693).
Additional details and impacted files
@@ Coverage Diff @@
## main #14200 +/- ##
=========================================
Coverage 71.69% 71.70%
- Complexity 62292 62319 +27
=========================================
Files 5140 5140
Lines 293020 293015 -5
Branches 42347 42345 -2
=========================================
+ Hits 210090 210108 +18
- Misses 65614 65622 +8
+ Partials 17316 17285 -31
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:grey_exclamation: Gradle check result for 87e181597cecd027ef521fd16f59148e746cb917: UNSTABLE
- TEST FAILURES:
1 org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew
1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
Is this actually faster? Got any micro benchmarks?
Did not do any benchmarking and this is based on https://github.com/opensearch-project/OpenSearch/pull/12131 that I am helping merge.
:x: Gradle check result for b4963d9d79ddd216908fe62bd4c0b29195cd82cd: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 8c0b49ca403bf8e97dfa3add121a8976e60ef477: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 5d0fc3296b3a76f560cc2f7aca90f296e5ee264c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 46bff453131418a68a5f99c90fb72fc5cf312372: SUCCESS
Is this actually faster? Got any micro benchmarks?
Additional reference related to performance: https://github.com/opensearch-project/OpenSearch/issues/12102
Is this actually faster? Got any micro benchmarks?
Did not do any benchmarking and this is based on #12131 that I am helping merge.
I understand that this does fix a potential stack overflow which was observed in https://github.com/opensearch-project/OpenSearch/issues/12102, but I think the regex parsing can also have the opposite effect on performance, I think we need to know what the impact of this change is before we merge it. Can we please get a quick microbenchmark @deshsidd ?
I think we need to know what the impact of this change is before we merge it. Can we please get a quick microbenchmark @deshsidd ?
Agreed. It should be easy to write a quick microbenchmark and get some data. Here's the README if you haven't done one of these before.
@dblock
I understand that this does fix a potential stack overflow which was observed in https://github.com/opensearch-project/OpenSearch/issues/12102, but I think the regex parsing can also have the opposite effect on performance, I think we need to know what the impact of this change is before we merge it.
I was reviewing this PR - I don't think we are adding any additional regex parsing than what already exists in the code.
@sandeshkr419 there's a filter.split("(?<!\\\\)\\.");, but I agree it's not in a tight loop. Still, if we're going to claim a performance improvement we should demonstrate it's one.
Ran some benchmarks by creating a cluster with my local changes and using opensearch benchmarks:
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
[0m
| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------------------------:|------------:|-------:|
| Cumulative indexing time of primary shards | | 1.57577 | min |
| Min cumulative indexing time across primary shards | | 0.506317 | min |
| Median cumulative indexing time across primary shards | | 0.515183 | min |
| Max cumulative indexing time across primary shards | | 0.554267 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 0.0539667 | min |
| Cumulative merge count of primary shards | | 4 | |
| Min cumulative merge time across primary shards | | 0.01275 | min |
| Median cumulative merge time across primary shards | | 0.01605 | min |
| Max cumulative merge time across primary shards | | 0.0251667 | min |
| Cumulative merge throttle time of primary shards | | 0 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0 | min |
| Cumulative refresh time of primary shards | | 0.171717 | min |
| Cumulative refresh count of primary shards | | 40 | |
| Min cumulative refresh time across primary shards | | 0.05055 | min |
| Median cumulative refresh time across primary shards | | 0.0515667 | min |
| Max cumulative refresh time across primary shards | | 0.0696 | min |
| Cumulative flush time of primary shards | | 0 | min |
| Cumulative flush count of primary shards | | 0 | |
| Min cumulative flush time across primary shards | | 0 | min |
| Median cumulative flush time across primary shards | | 0 | min |
| Max cumulative flush time across primary shards | | 0 | min |
| Total Young Gen GC time | | 0.343 | s |
| Total Young Gen GC count | | 8 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 0.113262 | GB |
| Translog size | | 1.53668e-07 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 25 | |
| Min Throughput | index | 28452.4 | docs/s |
| Mean Throughput | index | 46063.7 | docs/s |
| Median Throughput | index | 45448.6 | docs/s |
| Max Throughput | index | 62262.2 | docs/s |
| 50th percentile latency | index | 373.84 | ms |
| 90th percentile latency | index | 593.865 | ms |
| 99th percentile latency | index | 784.339 | ms |
| 100th percentile latency | index | 1145.18 | ms |
| 50th percentile service time | index | 373.84 | ms |
| 90th percentile service time | index | 593.865 | ms |
| 99th percentile service time | index | 784.339 | ms |
| 100th percentile service time | index | 1145.18 | ms |
| error rate | index | 0 | % |
| Min Throughput | wait-until-merges-finish | 7.4 | ops/s |
| Mean Throughput | wait-until-merges-finish | 7.4 | ops/s |
| Median Throughput | wait-until-merges-finish | 7.4 | ops/s |
| Max Throughput | wait-until-merges-finish | 7.4 | ops/s |
| 100th percentile latency | wait-until-merges-finish | 134.712 | ms |
| 100th percentile service time | wait-until-merges-finish | 134.712 | ms |
| error rate | wait-until-merges-finish | 0 | % |
Thanks @deshsidd - I am uncertain that we can illustrate performance gains with the benchmark posted above. We might have to use a search/query which can utilize the methods being refactored and then just run (and average the time taken) it for n times (against before and after your changes) to capture how much does a query performance utilizing the refactored methods have improved.
Ran a manual benchmark on my VM to add 100000 documents to an index and run 1000 filter_path queries on the index.
I then calculated the total time taken for all queries. I also increased the http.max_initial_line_length from 4 to 100 to avoid too_long_http_line_exception exception.
Total time taken for all queries with filter_path changes : 41.506942611 seconds Total time taken for all queries without filter_path changes : N/A
Got the following for every invocation without the filter_path changes:
"type": "pattern_syntax_exception",
"reason": "Stack overflow during pattern compilation\n\\\\."
@deshsidd - Thank you for sharing these results. Can you also share the query, just curious about the query shape (wink)?
@dblock - This looks sufficient for merging the change IMO.
Thanks @jainankitk @deshsidd @sandeshkr419, let's merge.
@deshsidd - Thank you for sharing these results. Can you also share the query, just curious about the
query shape (wink)?
filter_path=$(printf 'h.%0.s' {1..20000})
curl -s "$OPENSEARCH_URL/$INDEX_NAME/_search?pretty&filter_path=$filter_path" | jq '.'
:grey_exclamation: Gradle check result for 4405a75fe433f64b2305eb3abae0e5a736a862f5: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
:white_check_mark: Gradle check result for e300693f20c5cbe9b68cd193e4918c6981c93a00: SUCCESS