Support `UNION ALL`
We need to support the SetOp, tracking in #2911, but UNION ALL is much more widely used than others, so we should implement it first.
IIUC, UNION ALL concat the two (or multiple) inputs so that it may be converted to a simple merger in the steam engine in the frontend, and we don't need to introduce a new operator here.
cc @xiangjinwu
We already have a similar one in IndexDeltaJoin PR, but it’s still under review.
Still need binder and planner support 🤪
We already have a similar one in IndexDeltaJoin PR, but it’s still under review.
What's the difference between it and a trivial merger?
LookupUnion is a new operator after merger. It will merge all items from each channel in order (first, second, third) instead of randomly pick one.
To make life easier, I'd recommend:
- Use LogicalUnion for both lookup joins and union all
- Union will have a special property "for_lookup". If true, it will be converted to LookupUnion in stream plan; otherwise, it will be a normal Union.
- Union has to be a separate executor, that's because MergeExecutor only takes channels as inputs instead of executors. If we have some plan like this:
Union
Exchange
Join
A
B
Join
C
D
Then union's input will be one side channel, one side normal executor. So we can't use merge executor to do this.
LookupUnion is a new operator after merger. It will merge all items from each channel in order (first, second, third) instead of randomly pick one.
To make life easier, I'd recommend:
- Use LogicalUnion for both lookup joins and union all
- Union will have a special property "for_lookup". If true, it will be converted to LookupUnion in stream plan; otherwise, it will be a normal Union.
- Union has to be a separate executor, that's because MergeExecutor only takes channels as inputs instead of executors. If we have some plan like this:
Union Exchange Join A B Join C DThen union's input will be one side channel, one side normal executor. So we can't use merge executor to do this.
Different branches of UNION-ALL may be highly biased, is it ok to use such pick strategy?
For the union all you mentioned here, it will "randomly pick one"
For lookup's union, it will "merge all items from each channel in order (first, second, third)"
Prior discussion on binding:
(1) Without
union, thequeryand its bodyselectshare the same context (current behavior); (2) Withunion, each descendantselectwill use their own context, leaving the context ofqueryempty, andextra_order_exprswould return bind error as expected.
Originally posted by @xiangjinwu in https://github.com/singularity-data/risingwave/pull/2329#discussion_r866476756
@chenzl25 Is this completed?
@chenzl25 Is this completed?
From the user perspective, yes. We can still do some optimizations: subquery with union, better shuffle strategy.