[Enhancement] Support more intelligent/adaptive MV Refresh
Why I'm doing:
#58351 proposes introducing a more adaptive method to adjust this based on base table data info. However, it does not handle non-OLAP tables when collecting related partition information.
What I'm doing:
The "Gather statistics for CBO about external tables" mechanism records partition information of external tables in statistics.external_column_statistics. When refreshing materialized views involving external tables, these statistics can be used as a reference to compute relevant partition data.
Fixes #56973
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
- [ ] 3.2
- [ ] 3.1
add some FE UTs and SQL Testers for this?
Hi @LiShuMing
For the requirements:
- If the external table has empty statistics or fetch fails, fallback to the original strategy – This has been implemented.
- If the external table has partial partition statistics, how to repair the non-existent partitions? I've applied the following logic:
- If the number of missing partition statistics is less than 10%, we continue using the adaptive strategy.
- Otherwise, we fallback to the original strategy.
- Add some FE UTs and SQL Testers for this? – Done. Relevant unit tests and SQL test cases have been added.
Please let me know if any further changes are needed.
btw, the external statitics may be easily expired
Hi @Seaven Your concerns are reasonable, On the one hand, triggering a sync external statistics collection task for every materialized view (MV) refresh that involves external tables consumes excessive resources. On the other hand, If the external statistics are incomplete due to expiration, I have implemented a fallback to the strict mode to ensure the materialized view can still be refreshed correctly.
Can you rebase again and fix conflicts?
Can you rebase again and fix conflicts?
sure. I will rebase again.
Can you rebase again and fix conflicts?
Hi @LiShuMing ,I have resolved the conflicts. please help to review again
@mergify rebase
rebase
✅ Branch has been successfully rebased
Quality Gate passed
Issues
22 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:x: fail : 68 / 102 (66.67%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/transformation/partition/PartitionSelector.java | 0 | 19 | 00.00% | [809, 810, 811, 812, 813, 818, 819, 820, 821, 822, 823, 825, 826, 827, 828, 829, 831, 832, 833] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVPCTRefreshPartitioner.java | 2 | 7 | 28.57% | [180, 185, 186, 187, 188] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVPCTRefreshListPartitioner.java | 6 | 8 | 75.00% | [442, 512] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVPCTRefreshRangePartitioner.java | 16 | 19 | 84.21% | [492, 494, 499] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVRefreshPartitionSelector.java | 35 | 40 | 87.50% | [153, 154, 155, 156, 186] |
| :large_blue_circle: | com/starrocks/scheduler/PartitionBasedMvRefreshProcessor.java | 6 | 6 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVAdaptiveRefreshException.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVPCTRefreshNonPartitioner.java | 1 | 1 | 100.00% | [] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)