gpdb
gpdb copied to clipboard
Remove redundant predicates after transitive closures
Issue:
Current implementation of ORCA does not remove the redundant predicates after transitive closure.
Solution:
This PR is trying to remove the redundant predicates based on the following steps.
-
After the normalization step the predicates are already pushed down the tree. So they are redundant in the join condition.
-
If the child of the join is a EopScalarCmp, we are not checking for redundancy because we need one child for the Hash join condition.
-
If the child of the join is EopScalarBoolOp we are iterating through each child of EopScalarBoolOp and if its a EopScalarCmp with equlity type, we are checking if the value of that column is a constant.
-
If it's a constant then it can be removed as it has been already pushed down the tree in the previous normalization step.
-
If a condition arises when all the childs are redundant then we are not removing all the childs as this will boil down to a nested loop. So in order to do a Hash join we are keeping one child even if its redundant based on if the column is a distribution key.
Setup:
create table foo(a text, b text); create table bar(c text, d text); explain select * from foo join bar on foo.a=bar.c and foo.b=bar.d where bar.d='cc';
Existing Behaviour:
QUERY PLAN
-------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..862.00 rows=1 width=32)
-> Hash Join (cost=0.00..862.00 rows=1 width=32)
Hash Cond: ((foo.a = bar.c) AND (foo.b = bar.d))
-> Seq Scan on foo (cost=0.00..431.00 rows=1 width=16)
Filter: (b = 'cc'::text)
-> Hash (cost=431.00..431.00 rows=1 width=16)
-> Seq Scan on bar (cost=0.00..431.00 rows=1 width=16)
Filter: (d = 'cc'::text)
Optimizer: Pivotal Optimizer (GPORCA)
(9 rows)
New Behaviour:
QUERY PLAN
-------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..862.00 rows=1 width=32)
-> Hash Join (cost=0.00..862.00 rows=1 width=32)
Hash Cond: (foo.a = bar.c)
-> Seq Scan on foo (cost=0.00..431.00 rows=1 width=16)
Filter: (b = 'cc'::text)
-> Hash (cost=431.00..431.00 rows=1 width=16)
-> Seq Scan on bar (cost=0.00..431.00 rows=1 width=16)
Filter: (d = 'cc'::text)
Optimizer: Pivotal Optimizer (GPORCA)
(9 rows)
Here are some reminders before you submit the pull request
- [ ] Add tests for the change
- [ ] Document changes
- [ ] Communicate in the mailing list if needed
- [ ] Pass
make installcheck
- [ ] Review a PR in return to support the community