starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[BugFix] Fix mv refresh bug for queries with union all

Open LiShuMing opened this issue 10 months ago • 9 comments

Why I'm doing:

MV with union all 's result is not correct if

CREATE TABLE `t1` (
    `k1`  date not null, 
    `k2`  datetime not null, 
    `k3`  char(20), 
    `k4`  varchar(20), 
    `k5`  boolean, 
    `k6`  tinyint, 
    `k7`  smallint, 
    `k8`  int, 
    `k9`  bigint, 
    `k10` largeint, 
    `k11` float, 
    `k12` double, 
    `k13` decimal(27,9) ) 
DUPLICATE KEY(`k1`, `k2`, `k3`, `k4`, `k5`) 
PARTITION BY date_trunc('day', k1)
DISTRIBUTED BY HASH(`k1`, `k2`, `k3`) BUCKETS 3 
PROPERTIES ("storage_format" = "v2" );


INSERT INTO t1 VALUES
    ('2020-10-22','2020-10-23 12:12:12','k3','k4',0,1,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-23','2020-10-24 12:12:12','k3','k4',0,0,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-24','2020-10-25 12:12:12','k3','k4',0,1,2,3,4,5,1.1,1.12,2.889);
    
CREATE TABLE `t2` (
    `k1`  date not null, 
    `k2`  datetime not null, 
    `k3`  char(20), 
    `k4`  varchar(20), 
    `k5`  boolean, 
    `k6`  tinyint, 
    `k7`  smallint, 
    `k8`  int, 
    `k9`  bigint, 
    `k10` largeint, 
    `k11` float, 
    `k12` double, 
    `k13` decimal(27,9) ) 
DUPLICATE KEY(`k1`, `k2`, `k3`, `k4`, `k5`) 
PARTITION BY date_trunc('day', k1)
DISTRIBUTED BY HASH(`k1`, `k2`, `k3`) BUCKETS 3 
PROPERTIES ("storage_format" = "v2" );

INSERT INTO t2 VALUES
    ('2020-10-10','2020-10-23 12:12:12','k3','k4',0,1,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-11','2020-10-24 12:12:12','k3','k4',0,0,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-12','2020-10-25 12:12:12','k3','k4',0,1,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-21','2020-10-24 12:12:12','k3','k4',0,0,2,3,4,5,1.1,1.12,2.889),
    ('2020-10-22','2020-10-25 12:12:12','k3','k4',0,1,2,3,4,5,1.1,1.12,2.889);
    
CREATE MATERIALIZED VIEW IF NOT EXISTS test_mv1
PARTITION BY date_trunc('day', `k1`)
DISTRIBUTED BY HASH(`k1`)
REFRESH ASYNC
as 
select * from t1
union all
select * from t2;


mysql> select k1, count(1) from t1 group by k1 order by k1;
+------------+----------+
| k1         | count(1) |
+------------+----------+
| 2020-10-22 |        1 |
| 2020-10-23 |        1 |
| 2020-10-24 |        1 |
+------------+----------+
3 rows in set (0.46 sec)

mysql> select k1, count(1) from t2 group by k1 order by k1;
+------------+----------+
| k1         | count(1) |
+------------+----------+
| 2020-10-10 |        1 |
| 2020-10-11 |        1 |
| 2020-10-12 |        1 |
| 2020-10-21 |        1 |
| 2020-10-22 |        1 |
+------------+----------+
5 rows in set (0.53 sec)

mysql> select k1, count(1) from test_mv1 group by k1 order by k1;
+------------+----------+
| k1         | count(1) |
+------------+----------+
| 2020-10-22 |        2 |
+------------+----------+
1 row in set (0.30 sec)

What I'm doing:

PR(https://github.com/StarRocks/starrocks/pull/42950/files) introduce a bug: we should not calculate partition diff by using total ref base tables partition and mv's partitions instead of merging each diffs because the diff calculate will delete all partitions for each base table's input and mv's partitions which will introduce the bug.

            for (Map.Entry<Table, Column> entry : partitionTableAndColumn.entrySet()) {
                Table refBaseTable = entry.getKey();
                Column refBaseTablePartitionColumn = entry.getValue();
                // Collect the ref base table's partition range map.
                refBaseTablePartitionMap.put(refBaseTable, PartitionUtil.getPartitionKeyRange(
                        refBaseTable, refBaseTablePartitionColumn, partitionExpr));

                // To solve multi partition columns' problem of external table, record the mv partition name to all the same
                // partition names map here.
                if (!refBaseTable.isNativeTableOrMaterializedView()) {
                    refBaseTableMVPartitionMap.put(refBaseTable,
                            PartitionUtil.getMVPartitionNameMapOfExternalTable(refBaseTable,
                                    refBaseTablePartitionColumn, PartitionUtil.getPartitionNames(refBaseTable)));
                }

                Column partitionColumn = (materializedView.getPartitionInfo()).getPartitionColumns().get(0);
                PartitionDiffer differ = PartitionDiffer.build(materializedView, partitionRange);
                rangePartitionDiffList.add(PartitionUtil.getPartitionDiff(partitionExpr, partitionColumn,
                        refBaseTablePartitionMap.get(refBaseTable), mvRangePartitionMap, differ));
            }

Fixes #issue

What type of PR is this:

  • [x] BugFix
  • [ ] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 3.3
    • [ ] 3.2
    • [ ] 3.1
    • [ ] 3.0
    • [ ] 2.5

LiShuMing avatar Apr 23 '24 12:04 LiShuMing

@mergify rebase

LiShuMing avatar Apr 25 '24 10:04 LiShuMing

rebase

✅ Branch has been successfully rebased

mergify[bot] avatar Apr 25 '24 10:04 mergify[bot]

@mergify rebase

LiShuMing avatar Apr 25 '24 11:04 LiShuMing

rebase

✅ Nothing to do for rebase action

mergify[bot] avatar Apr 25 '24 11:04 mergify[bot]

@mergify rebase

LiShuMing avatar Apr 25 '24 12:04 LiShuMing

rebase

✅ Branch has been successfully rebased

mergify[bot] avatar Apr 25 '24 12:04 mergify[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Apr 29 '24 15:04 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 62 / 69 (89.86%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/scheduler/PartitionBasedMvRefreshProcessor.java 5 7 71.43% [1253, 1259]
:large_blue_circle: com/starrocks/sql/common/PartitionDiffer.java 42 47 89.36% [313, 314, 318, 319, 320]
:large_blue_circle: com/starrocks/sql/common/MvPartitionDiffResult.java 6 6 100.00% []
:large_blue_circle: com/starrocks/sql/common/SyncPartitionUtils.java 6 6 100.00% []
:large_blue_circle: com/starrocks/server/LocalMetastore.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/MvRefreshArbiter.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/common/RangePartitionDiff.java 1 1 100.00% []

github-actions[bot] avatar Apr 29 '24 15:04 github-actions[bot]

@Mergifyio backport branch-3.3

github-actions[bot] avatar Apr 30 '24 02:04 github-actions[bot]

backport branch-3.3

✅ Backports have been created

mergify[bot] avatar Apr 30 '24 02:04 mergify[bot]