gpdb Fix DQA in planner with primary key

Fix DQA in planner with primary key

Open charliettxx opened this issue 2 years ago • 6 comments

After merge Recognize functional dependency on primary keys from upstream, which allows a table's other columns to be referenced without listing them in GROUP BY, so long as the primary key column(s) are listed in GROUP BY. We should support SQL like

create table t1(a int primary key, b int, c int);
select count(distinct b), sum(b), c from t1 group a;

such SQL should be executable because a is primary key and c should be referred in targetlist.

Now we need to adapt it to DQA scenario in 7x, adding some test cases from 6x and fixing some minor issues for main branch. In main branch, we could handle this scenario easier than 6x especially on targetlist and path, and we should be careful that MDQAs TupleSplit Strategy cannot support it, which will split one tuple to multi-tuples, so we cannot assume that other columns in targetlist come from only one tuple(maybe they will pointer to NULL which after tuplesplit), as a result, we'd better abort it in TupleSplit strategy and back to normal agg strategy.

Here are some reminders before you submit the pull request

[x] Add tests for the change
[ ] Document changes
[ ] Communicate in the mailing list if needed
[ ] Pass make installcheck
[ ] Review a PR in return to support the community

Aug 01 '23 13:08 charliettxx

gpdb gpdb copied to clipboard

Fix DQA in planner with primary key

Here are some reminders before you submit the pull request

gpdb
gpdb copied to clipboard