gpdb
gpdb copied to clipboard
[6X] fix wrong result of MDQA in planner
In the 6x side, if the subpath's locus collocates tuples with group keys, the planner favors local sortgroupagg over MDQ join rewrite. We also choose sortgroupagg when the group key becomes constant, as constants should also be collocated.
In this PR, we need to differentiate between an empty groupClause and a constant
groupClause, as they are obviously distinct. In the function cdb_grouping_planner,
root->group_pathkeys == NULL and root->parse->groupClause == NULL indicate
an empty groupClause in the query. Otherwise, the groupClause should be constant.
Here are some reminders before you submit the pull request
- [ ] Add tests for the change
- [ ] Document changes
- [ ] Communicate in the mailing list if needed
- [ ] Pass
make installcheck - [ ] Review a PR in return to support the community
Thanks for your work. I spent some time on this today. Below is my cases:
create table t(a int, b int, c int, d int, e int) distributed by (a);
explain (costs off, verbose)
with cte as (
select
a, b,
count(distinct c),
count(distinct d),
count(distinct e)
from t
where a = 1
group by a,b
)
select * from cte where b = 2;
explain (costs off, verbose)
with cte as (
select
a, e,
count(distinct c),
count(distinct d),
count(distinct b)
from t
where a = 1
group by a,e
)
select * from cte where e = 2;
The 2nd SQL's plan on 6X (without this patch) is not correct.
I look into the function add_motion_to_dqa_child <-- join_dqa_coplan, the add motion function will call
List *pathkeys = make_pathkeys_for_groupclause(root, root->parse->groupClause, plan->targetlist);
the above code depends on root->eq_classes which is built at the early stage of subquery_planner.
However, cdb grouping for MDQA, will wrap each DQA part using subquery scan, which lead to target list's Variable change, which leads to mismatch of root->eq_classes with the target list here.
I don't think the above code is correct.
Your fix is before this line, lets think more on this.
pipeline https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev-6x_mdqa-rocky8