starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Support more intelligent/adaptive MV Refresh

Open hongkunxu opened this issue 6 months ago • 6 comments

Why I'm doing:

#58351 proposes introducing a more adaptive method to adjust this based on base table data info. However, it does not handle non-OLAP tables when collecting related partition information.

What I'm doing:

The "Gather statistics for CBO about external tables" mechanism records partition information of external tables in statistics.external_column_statistics. When refreshing materialized views involving external tables, these statistics can be used as a reference to compute relevant partition data.

Fixes #56973

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3
    • [ ] 3.2
    • [ ] 3.1

hongkunxu avatar Jun 04 '25 07:06 hongkunxu

add some FE UTs and SQL Testers for this?

LiShuMing avatar Jun 05 '25 05:06 LiShuMing

Hi @LiShuMing

For the requirements:

  1. If the external table has empty statistics or fetch fails, fallback to the original strategy – This has been implemented.
  2. If the external table has partial partition statistics, how to repair the non-existent partitions? I've applied the following logic:
  • If the number of missing partition statistics is less than 10%, we continue using the adaptive strategy.
  • Otherwise, we fallback to the original strategy.
  1. Add some FE UTs and SQL Testers for this? – Done. Relevant unit tests and SQL test cases have been added.

Please let me know if any further changes are needed.

hongkunxu avatar Jun 10 '25 09:06 hongkunxu

btw, the external statitics may be easily expired

Hi @Seaven Your concerns are reasonable, On the one hand, triggering a sync external statistics collection task for every materialized view (MV) refresh that involves external tables consumes excessive resources. On the other hand, If the external statistics are incomplete due to expiration, I have implemented a fallback to the strict mode to ensure the materialized view can still be refreshed correctly.

hongkunxu avatar Jun 23 '25 02:06 hongkunxu

Can you rebase again and fix conflicts?

LiShuMing avatar Jul 01 '25 02:07 LiShuMing

Can you rebase again and fix conflicts?

sure. I will rebase again.

hongkunxu avatar Jul 01 '25 03:07 hongkunxu

Can you rebase again and fix conflicts?

Hi @LiShuMing ,I have resolved the conflicts. please help to review again

hongkunxu avatar Jul 02 '25 05:07 hongkunxu

@mergify rebase

LiShuMing avatar Jul 03 '25 10:07 LiShuMing

rebase

✅ Branch has been successfully rebased

mergify[bot] avatar Jul 03 '25 10:07 mergify[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Jul 07 '25 07:07 github-actions[bot]

[FE Incremental Coverage Report]

:x: fail : 68 / 102 (66.67%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/partition/PartitionSelector.java 0 19 00.00% [809, 810, 811, 812, 813, 818, 819, 820, 821, 822, 823, 825, 826, 827, 828, 829, 831, 832, 833]
:large_blue_circle: com/starrocks/scheduler/mv/MVPCTRefreshPartitioner.java 2 7 28.57% [180, 185, 186, 187, 188]
:large_blue_circle: com/starrocks/scheduler/mv/MVPCTRefreshListPartitioner.java 6 8 75.00% [442, 512]
:large_blue_circle: com/starrocks/scheduler/mv/MVPCTRefreshRangePartitioner.java 16 19 84.21% [492, 494, 499]
:large_blue_circle: com/starrocks/scheduler/mv/MVRefreshPartitionSelector.java 35 40 87.50% [153, 154, 155, 156, 186]
:large_blue_circle: com/starrocks/scheduler/PartitionBasedMvRefreshProcessor.java 6 6 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/MVAdaptiveRefreshException.java 2 2 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/MVPCTRefreshNonPartitioner.java 1 1 100.00% []

github-actions[bot] avatar Jul 07 '25 07:07 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Jul 07 '25 07:07 github-actions[bot]