incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[CH-76]support running union on native engine

Open lgbo-ustc opened this issue 2 years ago • 1 comments

What changes were proposed in this pull request?

add support for running union operation on native engine

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

lgbo-ustc avatar Oct 13 '22 09:10 lgbo-ustc

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

github-actions[bot] avatar Oct 13 '22 09:10 github-actions[bot]

please rebase to main.

@rui-mo do you verified TPC-DS and TPCH on Velox Backend?

zzcclp avatar Oct 20 '22 12:10 zzcclp

please rebase to main.

@rui-mo do you verified TPC-DS and TPCH on Velox Backend?

Will do that today, thanks.

rui-mo avatar Oct 21 '22 01:10 rui-mo

Tested this PR on TPC-DS, but found union fallbacked due to the input iter cannot be found. Will check what support is needed for Velox backend.

rui-mo avatar Oct 21 '22 05:10 rui-mo

The coredump maybe related to one of latest pr in clickhouse backend.

lgbo-ustc avatar Oct 24 '22 02:10 lgbo-ustc

try to rebase to the latest main branch

lgbo-ustc avatar Oct 24 '22 02:10 lgbo-ustc

I ran the below sql and the Spark UI show the DAG below:

select count(date_sk) from (
  select d_date_sk as date_sk from date_dim
  union all
  select ws_sold_date_sk as date_sk from web_sales
)

image The UnionExecTransformer is not needed to wrap in a WholeStageCodegenTransformer.

zzcclp avatar Oct 26 '22 14:10 zzcclp

If don't wrap union exec transformer into a WholeStageCodegenTransformer , we need to implement doExecuteColumnar for this operator

lgbo-ustc avatar Oct 27 '22 01:10 lgbo-ustc

Hi @lgbo-ustc, I'm testing on TPC-DS Q5, and found agg and union were in a single wholestage transformer. Is this the expected behavior? Screenshot from 2022-10-28 08-06-14

rui-mo avatar Oct 28 '22 08:10 rui-mo

@zzcclp @rui-mo would you take a review again ?

I realize that I may have made something wrong. I should not implement a TransformSupport here, but just a columnar alternative for UnionExec.

With the new implementation, the q5's physical plan is

                  +- *(1) Expand [[sales#154, returns#156, profit#132, channel#561, id#562, 0], [sales#154, returns#156, profit#132, channel#561, null, 1], [sales#154, returns#156, profit#132, null, null, 3]], [sales#154, returns#156, profit#132, channel#564, id#565, spark_grouping_id#563L]
                     +- CHNativeColumnarToRow
                        +- UnionExecTransformer
                           :- *(21) HashAggregateTransformer(keys=[s_store_id#262], functions=[sum(sales_price#144), sum(return_amt#534), sum(profit#145), sum(net_loss#535)], output=[sales#154, returns#156, profit#132, channel#561, id#562])
                           :  +- CoalesceBatches
                           :     +- ColumnarExchange hashpartitioning(s_store_id#262, 5), ENSURE_REQUIREMENTS, [plan_id=1014], [id=#1014], [OUTPUT] ArrayBuffer(s_store_id:StringType, sum:DoubleType, sum:DoubleType, sum:DoubleType, sum:DoubleType), [OUTPUT] ArrayBuffer(s_store_id:StringType, sum:DoubleType, sum:DoubleType, sum:DoubleType, sum:DoubleType)
                           :        +- *(20) HashAggregateTransformer(keys=[s_store_id#262], functions=[partial_sum(sales_price#144), partial_sum(return_amt#534), partial_sum(profit#145), partial_sum(net_loss#535)], output=[s_store_id#262, sum#588, sum#589, sum#590, sum#591])

lgbo-ustc avatar Nov 01 '22 08:11 lgbo-ustc