pinot icon indicating copy to clipboard operation
pinot copied to clipboard

refactor: remove threshold for sort-exchange copy transformation

Open gortiz opened this issue 1 month ago • 5 comments

This PR adds a query option and a broker config to customize the threshold used in the optimization introduced in #12237.

This optimization affects queries like:

SELECT whatever
FROM table
ORDER BY col
LIMIT x

MSE blindly adds an exchange here (I think sometimes it is not needed, but that is not the point of this PR). We need to add a sort-with-limit in the receiver stage to preserve semantics, and we can also add one in the sender stage to avoid sending extra blocks and, eventually, enable k-merge sort.

The PR #12237 modifies the PinotSortExchangeCopyRule rule only to add the sort-with-limit when:

  • There is a limit
  • The limit is smaller than 10k rows

The first is justified by the fact that we don't use k-merge sort, and therefore, the sending side sort is useless. I don't understand the reason why the <10k rows condition was added. We have found cases where huge limits are used by not adding this filter, we end up sending too many rows that we don't actually need.

Initially, this PR removed the second condition, but in order to be backward compatible, I changed the code to include:

  • A new query option called pinot.broker.multistage.sort.exchange.copy.threshold, which can be used to change the default threshold for all queries in the broker.
  • The query option sortExchangeCopyThreshold, which can be used to change this threshold for specific queries.

The change includes two tests that verify the query option can be used to increase the threshold and also to reduce it. When I created these tests I found that we are not using the default QueryEnvironment in the tests that derive from QueryEnvironmentTestBase.java. This is because they called the default constructor of QueryEnvironment instead of using the one created in MultiStageBrokerRequestHandler. In order to fix that, this PR creates a QueryEnvironmentFactory that contains the code of MultiStageBrokerRequestHandler.getQueryEnvConf.

Notice that this PR only modifies the tests that derive from ResourceBasedQueryPlansTest.testQueryExplainPlansAndQueryPlanConversion. There are other tests that should be upgraded as well, but they are not related to this PR

gortiz avatar Dec 05 '25 15:12 gortiz

Codecov Report

:x: Patch coverage is 86.66667% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.27%. Comparing base (227de91) to head (2df1f79). :warning: Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...requesthandler/MultiStageBrokerRequestHandler.java 0.00% 2 Missing :warning:
...e/pinot/common/utils/config/QueryOptionsUtils.java 66.66% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17328      +/-   ##
============================================
+ Coverage     63.23%   63.27%   +0.04%     
  Complexity     1474     1474              
============================================
  Files          3152     3153       +1     
  Lines        187870   187914      +44     
  Branches      28762    28762              
============================================
+ Hits         118793   118896     +103     
+ Misses        59861    59805      -56     
+ Partials       9216     9213       -3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.22% <86.66%> (+0.01%) :arrow_up:
java-21 63.24% <86.66%> (+0.04%) :arrow_up:
temurin 63.27% <86.66%> (+0.04%) :arrow_up:
unittests 63.26% <86.66%> (+0.04%) :arrow_up:
unittests1 55.69% <92.85%> (+0.02%) :arrow_up:
unittests2 33.91% <56.66%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Dec 05 '25 16:12 codecov-commenter

I guess you meant to link to https://github.com/apache/pinot/pull/12237 in the PR description?

You are right. Updated

gortiz avatar Dec 09 '25 14:12 gortiz

@yashmayya I modified this PR. Can you review it again?

gortiz avatar Dec 10 '25 15:12 gortiz

What is our official recommendation going to be for tuning this threshold?

I think until we have k-merge implemented, we should disable this optimization. Whether we want to disable it by default or not is something we should discuss

gortiz avatar Dec 15 '25 10:12 gortiz

The compilation error should be fixed now

gortiz avatar Dec 15 '25 10:12 gortiz