starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[BugFix] Consider CN number in HashJoinCostModel.

Open Jcnessss opened this issue 1 year ago • 4 comments

Why I'm doing:

Sometimes FE choose the wrong join mode while cluster is running in share-data mode because of the following code of HashJoinCostModel

public double getMemCost() {
        JoinExecMode execMode = deriveJoinExecMode();
        double rightOutput = rightStatistics.getOutputSize(context.getChildOutputColumns(1));
        double memCost;
        int beNum = Math.max(1, ConnectContext.get().getAliveBackendNumber());

        if (JoinExecMode.BROADCAST == execMode) {
            memCost = rightOutput * beNum;
        } else {
            memCost = rightOutput;
        }
        return memCost;
    }

The broadcast cost should be rightOutput * (beNum + cnNum)

What I'm doing:

Fix the bug.

What type of PR is this:

  • [x] BugFix
  • [ ] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 3.3
    • [x] 3.2
    • [ ] 3.1
    • [ ] 3.0
    • [ ] 2.5

Jcnessss avatar Oct 16 '24 10:10 Jcnessss

@Seaven @murphyatwork Thanks for the review! I'm sorry that I found recently the method GetClusterInfo was removed, I have changed it to nodeMgr.GetClusterInfo. That's my fault.

Jcnessss avatar Oct 18 '24 09:10 Jcnessss

The failed test OptimizerTaskTest.testTopDownRewrite looks good on my PC.

Jcnessss avatar Oct 22 '24 10:10 Jcnessss

The Test TPCHPlanWithHistogramCostTest#testTPCH failed because query 21 failed. The difference between the plans is that a join turns into a colocate join from a shuffle join. image Turned into: image

The code that make the difference is as follow:

private double getAvgProbeCost() {
        ...
        int parallelFactor = Math.max(ConnectContext.get().getAliveBackendNumber() +
                ConnectContext.get().getGlobalStateMgr().getNodeMgr().getClusterInfo().getAliveComputeNodeNumber(),
                ConnectContext.get().getSessionVariable().getDegreeOfParallelism()) * 2;
        double mapSize = Math.min(1, keySize) * rightStatistics.getOutputRowCount();

        if (JoinExecMode.BROADCAST == execMode) {
            cachePenaltyFactor = Math.max(1, Math.log(mapSize / BOTTOM_NUMBER));
            // normalize ration when it hits the limit
            cachePenaltyFactor = Math.min(BROADCAST_MAT_RATIO, cachePenaltyFactor);
        } else {
            cachePenaltyFactor = Math.max(1, (Math.log(mapSize / BOTTOM_NUMBER) -
                    Math.log(parallelFactor) / Math.log(2)));
            // normalize ration when it hits the limit
            cachePenaltyFactor = Math.min(SHUFFLE_MAX_RATIO, cachePenaltyFactor);
        }
        ...
    }

I think the probe cost in shuffle mode should consider about the CN number, and the colocate join might be faster than shuffle join. @kevincai @Seaven @murphyatwork Could you please help me look at this? I wonder if it should be colocate join so that I can modify the test result in src/test/resources/sql/tpch-histogram-cost/q21.sql.

Jcnessss avatar Oct 23 '24 07:10 Jcnessss

@Jcnessss I think colocate join is faster than shuffle join, just do it~ I found has add the compute node in DistributedEnvPlanTestBase, you can check it

Seaven avatar Oct 23 '24 07:10 Seaven

@Seaven Thanks for the reply! I guess this time this patch would be fine to merge. Could you please help me trigger the tests.

Jcnessss avatar Oct 23 '24 08:10 Jcnessss

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Oct 23 '24 11:10 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 7 / 7 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/optimizer/cost/HashJoinCostModel.java 7 7 100.00% []

github-actions[bot] avatar Oct 23 '24 11:10 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Oct 23 '24 11:10 github-actions[bot]

@Mergifyio backport branch-3.3

github-actions[bot] avatar Oct 24 '24 02:10 github-actions[bot]

@Mergifyio backport branch-3.2

github-actions[bot] avatar Oct 24 '24 02:10 github-actions[bot]

backport branch-3.3

✅ Backports have been created

mergify[bot] avatar Oct 24 '24 02:10 mergify[bot]

backport branch-3.2

✅ Backports have been created

mergify[bot] avatar Oct 24 '24 02:10 mergify[bot]

@murphyatwork @Seaven @kevincai Thanks for all the reviews~

Jcnessss avatar Oct 24 '24 02:10 Jcnessss