starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Compare mv rowCount when both mv dimensions contains query dimensions

Open kaijianding opened this issue 1 year ago • 4 comments

Why I'm doing:

-- rows: 100 mv1: 
select sum(v1) from t1 group by a, b, c, f; 

-- rows: 10000 mv2: 
select sum(v1) from t1 group by a, b, d; 

query: select sum(v1) from t1 where b = 'a' group by a;

Before: prefer mv with less dimensions -> mv2, but mv2 has more rows which is not expected. After: when many mvs satisfy query, prefer mv with less rows, so choose mv1.

What I'm doing:

  1. MaterializationContext.RewriteOrdering decides which MVs are retained in case too many MV candidates and candidate list should be truncated. MaterializationContext.RewriteOrdering should compare MV's maxPartitionRowCount rather than MV's total row count, in case a MV is with less partitions and total row count but maxPartitionRowCount is large.
  2. BestMvSelector is the actually place to decide which MV should be chose. The comparator in BestMvSelector should consider row count first if MV's outputRowCount is not zero in statistics.

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 3.3
    • [ ] 3.2
    • [ ] 3.1
    • [ ] 3.0
    • [ ] 2.5

kaijianding avatar Sep 27 '24 09:09 kaijianding

Quality Gate Failed Quality Gate failed

Failed conditions
3.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Oct 22 '24 13:10 sonarqubecloud[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Oct 22 '24 14:10 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 17 / 17 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/catalog/MaterializedView.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/materialization/BestMvSelector.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/MaterializationContext.java 8 8 100.00% []

github-actions[bot] avatar Oct 22 '24 14:10 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Oct 22 '24 14:10 github-actions[bot]

@satanson any more comments?

kaijianding avatar Oct 25 '24 12:10 kaijianding

@Mergifyio backport branch-3.3

github-actions[bot] avatar Nov 28 '24 06:11 github-actions[bot]

backport branch-3.3

✅ Backports have been created

mergify[bot] avatar Nov 28 '24 06:11 mergify[bot]