refactor: remove threshold for sort-exchange copy transformation
This PR adds a query option and a broker config to customize the threshold used in the optimization introduced in #12237.
This optimization affects queries like:
SELECT whatever
FROM table
ORDER BY col
LIMIT x
MSE blindly adds an exchange here (I think sometimes it is not needed, but that is not the point of this PR). We need to add a sort-with-limit in the receiver stage to preserve semantics, and we can also add one in the sender stage to avoid sending extra blocks and, eventually, enable k-merge sort.
The PR #12237 modifies the PinotSortExchangeCopyRule rule only to add the sort-with-limit when:
- There is a limit
- The limit is smaller than 10k rows
The first is justified by the fact that we don't use k-merge sort, and therefore, the sending side sort is useless. I don't understand the reason why the <10k rows condition was added. We have found cases where huge limits are used by not adding this filter, we end up sending too many rows that we don't actually need.
Initially, this PR removed the second condition, but in order to be backward compatible, I changed the code to include:
- A new query option called
pinot.broker.multistage.sort.exchange.copy.threshold, which can be used to change the default threshold for all queries in the broker. - The query option
sortExchangeCopyThreshold, which can be used to change this threshold for specific queries.
The change includes two tests that verify the query option can be used to increase the threshold and also to reduce it. When I created these tests I found that we are not using the default QueryEnvironment in the tests that derive from QueryEnvironmentTestBase.java. This is because they called the default constructor of QueryEnvironment instead of using the one created in MultiStageBrokerRequestHandler. In order to fix that, this PR creates a QueryEnvironmentFactory that contains the code of MultiStageBrokerRequestHandler.getQueryEnvConf.
Notice that this PR only modifies the tests that derive from ResourceBasedQueryPlansTest.testQueryExplainPlansAndQueryPlanConversion. There are other tests that should be upgraded as well, but they are not related to this PR
Codecov Report
:x: Patch coverage is 86.66667% with 4 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.27%. Comparing base (227de91) to head (2df1f79).
:warning: Report is 5 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...requesthandler/MultiStageBrokerRequestHandler.java | 0.00% | 2 Missing :warning: |
| ...e/pinot/common/utils/config/QueryOptionsUtils.java | 66.66% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #17328 +/- ##
============================================
+ Coverage 63.23% 63.27% +0.04%
Complexity 1474 1474
============================================
Files 3152 3153 +1
Lines 187870 187914 +44
Branches 28762 28762
============================================
+ Hits 118793 118896 +103
+ Misses 59861 59805 -56
+ Partials 9216 9213 -3
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.22% <86.66%> (+0.01%) |
:arrow_up: |
| java-21 | 63.24% <86.66%> (+0.04%) |
:arrow_up: |
| temurin | 63.27% <86.66%> (+0.04%) |
:arrow_up: |
| unittests | 63.26% <86.66%> (+0.04%) |
:arrow_up: |
| unittests1 | 55.69% <92.85%> (+0.02%) |
:arrow_up: |
| unittests2 | 33.91% <56.66%> (+0.03%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
I guess you meant to link to https://github.com/apache/pinot/pull/12237 in the PR description?
You are right. Updated
@yashmayya I modified this PR. Can you review it again?
What is our official recommendation going to be for tuning this threshold?
I think until we have k-merge implemented, we should disable this optimization. Whether we want to disable it by default or not is something we should discuss
The compilation error should be fixed now