incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[GLUTEN-4889][VL] Support approx_percentile

Open WangGuangxin opened this issue 1 year ago • 11 comments

What changes were proposed in this pull request?

Support approx_percentile for VL backend

(Fixes: #4889)

How was this patch tested?

UT

WangGuangxin avatar Mar 18 '24 23:03 WangGuangxin

https://github.com/apache/incubator-gluten/issues/4889

github-actions[bot] avatar Mar 18 '24 23:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 18 '24 23:03 github-actions[bot]

CC: @zhztheplayer

zhouyuan avatar Mar 20 '24 01:03 zhouyuan

I believe the change on Gluten side is ready now. But the ApproxPercentileAggregate implements in velox has to make some minor modification, becasue when add intermediate data, it checks the row vector encoding in addIntermediateImpl https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L652 , I believe the encoding is guaranteed by extractAccumulators https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L262.

But when it comes in Gluten, the output of PartialAgg are processed by Shuffle, which will flatten all vectors, so that the encoding is not guaranteed when added to FinalAgg. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/shuffle/VeloxShuffleWriter.cc#L343

cc @zhztheplayer @liujiayi771 @ulysses-you

WangGuangxin avatar Mar 20 '24 01:03 WangGuangxin

Thank you in advance!

I believe the change on Gluten side is ready now.

Did that mean this PR can be merged prior to merging Velox changes?

zhztheplayer avatar Mar 20 '24 03:03 zhztheplayer

@WangGuangxin Perhaps we could create an issue in the upstream Velox community to remove this verification.

liujiayi771 avatar Mar 20 '24 04:03 liujiayi771

Hi @WangGuangxin so the current plan is to merge

VELOX_REPO=https://github.com/wangguangxin/velox.git VELOX_BRANCH=2024_03_15_approx_percentile

to upstream Velox before this ? Am I understanding correctly?

zhztheplayer avatar Mar 21 '24 00:03 zhztheplayer

this

yes, we should merge to upsteam first before merge this PR

WangGuangxin avatar Mar 21 '24 01:03 WangGuangxin

@WangGuangxin Any progress of this PR?

liujiayi771 avatar Apr 25 '24 08:04 liujiayi771