materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Differential join can pick plan that reuses less arrangements when there are more than two inputs

Open wangandi opened this issue 5 years ago • 1 comments

Seen in a prospect query. There are no keys on foo, bar, or baz. There is only an index on foo(a).

select <stuff>
from foo
   inner join bar on foo.a = bar.a
   inner join baz on foo.b = baz.b

The plan came out to something like this:

 %0 =
 | Get foo
 | ArrangeBy (b) 

 %1 =
 | Get bar
 | ArrangeBy (a) 

 %2 = 
 | Get baz

 %3 = 
 | Join %0 %1 %2 (= %0.a %1.a) (= %0.b %1.b)
 | | implementation = Differential %2 %0.(b) %1.(a)
 | | demand = <stuff>

A better plan would've been

 %0 =
 | Get foo
 | ArrangeBy (a) 

 %1 =
 | Get bar
 | ArrangeBy (b) 

 %2 = 
 | Get baz

 %3 = 
 | Join %0 %1 %2 (= %0.a %1.a) (= %0.b %1.b)
 | | implementation = Differential %1 %0.(a) %2.(b)
 | | demand = <stuff>

since this would allow reusing the index on foo(a).

Currently, the way that a differential join plan is picked is that each input in the plan except for the first one gets a rating (technically called a characteristic).

  • The plan %2 %0.(b) %1.(a) gets the set of ratings [(unique_key: false, key_length 1, prexisting_arrangement: false, input: 0), (unique_key: false, key_length: 1, prexisting_arrangement: false, input 1)]
  • The plan %1 %0.(a) %2.(b) gets the set of ratings [(unique_key: false, key_length 1, prexisting_arrangement: true, input: 0), (unique_key: false, key_length: 1, prexisting_arrangement: false, input: 2)] Then we take the worst rating from each plan and compare them to one another. So we are comparing
  • (unique_key: false, key_length: 1, prexisting_arrangement: false, input 1) (from plan %2 %0.(b) %1.(a))
  • (unique_key: false, key_length: 1, prexisting_arrangement: false, input: 2) (from plan %1 %0.(a) %2.(b)) which are equally bad. The tiebreaker is the input number. Because 1 < 2, plan %2 %0.(b) %1.(a)wins.

A better tiebreaker would be the second worst rating from each plan, third worst rating, and so on.

wangandi avatar Oct 07 '20 23:10 wangandi

The workaround is to create a subview:

create view subview as 
select foo.b as b, <stuff>
from foo
   inner join bar on foo.a = bar.a

select <stuff>
from subview
  inner join baz on subview.b = baz.b

wangandi avatar Dec 09 '20 15:12 wangandi