incubator-gluten
incubator-gluten copied to clipboard
[CH-76]support running union on native engine
What changes were proposed in this pull request?
add support for running union operation on native engine
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/oap-project/gluten/issues
Then could you also rename commit message and pull request title in the following format?
[Gluten-${ISSUES_ID}] ${detailed message}
See also:
please rebase to main.
@rui-mo do you verified TPC-DS and TPCH on Velox Backend?
please rebase to main.
@rui-mo do you verified TPC-DS and TPCH on Velox Backend?
Will do that today, thanks.
Tested this PR on TPC-DS, but found union fallbacked due to the input iter cannot be found. Will check what support is needed for Velox backend.
The coredump maybe related to one of latest pr in clickhouse backend.
try to rebase to the latest main branch
I ran the below sql and the Spark UI show the DAG below:
select count(date_sk) from (
select d_date_sk as date_sk from date_dim
union all
select ws_sold_date_sk as date_sk from web_sales
)
The UnionExecTransformer is not needed to wrap in a WholeStageCodegenTransformer.
If don't wrap union exec transformer into a WholeStageCodegenTransformer
, we need to implement doExecuteColumnar
for this operator
Hi @lgbo-ustc, I'm testing on TPC-DS Q5, and found agg and union were in a single wholestage transformer. Is this the expected behavior?
@zzcclp @rui-mo would you take a review again ?
I realize that I may have made something wrong. I should not implement a TransformSupport
here, but just a columnar alternative for UnionExec
.
With the new implementation, the q5's physical plan is
+- *(1) Expand [[sales#154, returns#156, profit#132, channel#561, id#562, 0], [sales#154, returns#156, profit#132, channel#561, null, 1], [sales#154, returns#156, profit#132, null, null, 3]], [sales#154, returns#156, profit#132, channel#564, id#565, spark_grouping_id#563L]
+- CHNativeColumnarToRow
+- UnionExecTransformer
:- *(21) HashAggregateTransformer(keys=[s_store_id#262], functions=[sum(sales_price#144), sum(return_amt#534), sum(profit#145), sum(net_loss#535)], output=[sales#154, returns#156, profit#132, channel#561, id#562])
: +- CoalesceBatches
: +- ColumnarExchange hashpartitioning(s_store_id#262, 5), ENSURE_REQUIREMENTS, [plan_id=1014], [id=#1014], [OUTPUT] ArrayBuffer(s_store_id:StringType, sum:DoubleType, sum:DoubleType, sum:DoubleType, sum:DoubleType), [OUTPUT] ArrayBuffer(s_store_id:StringType, sum:DoubleType, sum:DoubleType, sum:DoubleType, sum:DoubleType)
: +- *(20) HashAggregateTransformer(keys=[s_store_id#262], functions=[partial_sum(sales_price#144), partial_sum(return_amt#534), partial_sum(profit#145), partial_sum(net_loss#535)], output=[s_store_id#262, sum#588, sum#589, sum#590, sum#591])