gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Remove redundant predicates after transitive closures

Open DevChattopadhyay opened this issue 2 years ago • 0 comments

Issue:

Current implementation of ORCA does not remove the redundant predicates after transitive closure.

Solution:

This PR is trying to remove the redundant predicates based on the following steps.

  • After the normalization step the predicates are already pushed down the tree. So they are redundant in the join condition.

  • If the child of the join is a EopScalarCmp, we are not checking for redundancy because we need one child for the Hash join condition.

  • If the child of the join is EopScalarBoolOp we are iterating through each child of EopScalarBoolOp and if its a EopScalarCmp with equlity type, we are checking if the value of that column is a constant.

  • If it's a constant then it can be removed as it has been already pushed down the tree in the previous normalization step.

  • If a condition arises when all the childs are redundant then we are not removing all the childs as this will boil down to a nested loop. So in order to do a Hash join we are keeping one child even if its redundant based on if the column is a distribution key.

Setup:

create table foo(a text, b text); create table bar(c text, d text); explain select * from foo join bar on foo.a=bar.c and foo.b=bar.d where bar.d='cc';

Existing Behaviour:

                            QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..862.00 rows=1 width=32)
   ->  Hash Join  (cost=0.00..862.00 rows=1 width=32)
         Hash Cond: ((foo.a = bar.c) AND (foo.b = bar.d))
         ->  Seq Scan on foo  (cost=0.00..431.00 rows=1 width=16)
               Filter: (b = 'cc'::text)
         ->  Hash  (cost=431.00..431.00 rows=1 width=16)
               ->  Seq Scan on bar  (cost=0.00..431.00 rows=1 width=16)
                     Filter: (d = 'cc'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(9 rows)

New Behaviour:

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..862.00 rows=1 width=32)
   ->  Hash Join  (cost=0.00..862.00 rows=1 width=32)
         Hash Cond: (foo.a = bar.c)
         ->  Seq Scan on foo  (cost=0.00..431.00 rows=1 width=16)
               Filter: (b = 'cc'::text)
         ->  Hash  (cost=431.00..431.00 rows=1 width=16)
               ->  Seq Scan on bar  (cost=0.00..431.00 rows=1 width=16)
                     Filter: (d = 'cc'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(9 rows)

Here are some reminders before you submit the pull request

  • [ ] Add tests for the change
  • [ ] Document changes
  • [ ] Communicate in the mailing list if needed
  • [ ] Pass make installcheck
  • [ ] Review a PR in return to support the community

DevChattopadhyay avatar Sep 02 '22 08:09 DevChattopadhyay