risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

Support `UNION ALL`

Open TennyZhuang opened this issue 3 years ago • 7 comments

We need to support the SetOp, tracking in #2911, but UNION ALL is much more widely used than others, so we should implement it first.

IIUC, UNION ALL concat the two (or multiple) inputs so that it may be converted to a simple merger in the steam engine in the frontend, and we don't need to introduce a new operator here.

cc @xiangjinwu

TennyZhuang avatar May 30 '22 14:05 TennyZhuang

We already have a similar one in IndexDeltaJoin PR, but it’s still under review.

skyzh avatar May 30 '22 14:05 skyzh

Still need binder and planner support 🤪

skyzh avatar May 30 '22 14:05 skyzh

We already have a similar one in IndexDeltaJoin PR, but it’s still under review.

What's the difference between it and a trivial merger?

TennyZhuang avatar May 30 '22 14:05 TennyZhuang

LookupUnion is a new operator after merger. It will merge all items from each channel in order (first, second, third) instead of randomly pick one.

To make life easier, I'd recommend:

  • Use LogicalUnion for both lookup joins and union all
  • Union will have a special property "for_lookup". If true, it will be converted to LookupUnion in stream plan; otherwise, it will be a normal Union.
  • Union has to be a separate executor, that's because MergeExecutor only takes channels as inputs instead of executors. If we have some plan like this:
Union
  Exchange
    Join
      A
      B
  Join
    C
    D

Then union's input will be one side channel, one side normal executor. So we can't use merge executor to do this.

skyzh avatar May 30 '22 15:05 skyzh

LookupUnion is a new operator after merger. It will merge all items from each channel in order (first, second, third) instead of randomly pick one.

To make life easier, I'd recommend:

  • Use LogicalUnion for both lookup joins and union all
  • Union will have a special property "for_lookup". If true, it will be converted to LookupUnion in stream plan; otherwise, it will be a normal Union.
  • Union has to be a separate executor, that's because MergeExecutor only takes channels as inputs instead of executors. If we have some plan like this:
Union
  Exchange
    Join
      A
      B
  Join
    C
    D

Then union's input will be one side channel, one side normal executor. So we can't use merge executor to do this.

Different branches of UNION-ALL may be highly biased, is it ok to use such pick strategy?

TennyZhuang avatar May 30 '22 16:05 TennyZhuang

For the union all you mentioned here, it will "randomly pick one"

For lookup's union, it will "merge all items from each channel in order (first, second, third)"

skyzh avatar May 30 '22 17:05 skyzh

Prior discussion on binding:

(1) Without union, the query and its body select share the same context (current behavior); (2) With union, each descendant select will use their own context, leaving the context of query empty, and extra_order_exprs would return bind error as expected.

Originally posted by @xiangjinwu in https://github.com/singularity-data/risingwave/pull/2329#discussion_r866476756

xiangjinwu avatar May 31 '22 03:05 xiangjinwu

@chenzl25 Is this completed?

fuyufjh avatar Nov 21 '22 07:11 fuyufjh

@chenzl25 Is this completed?

From the user perspective, yes. We can still do some optimizations: subquery with union, better shuffle strategy.

chenzl25 avatar Nov 21 '22 08:11 chenzl25