[VL] No extra SortExec needed before WindowExec if window type is `sort`
Backend
VL (Velox)
Bug description
When spark.gluten.sql.columnar.backend.velox.window.type is set to sort, there should be no extra SortExec before WindowExec in plan.
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
Can you list such a query as example?
In order to remove the duplicated sort operator, we implement the streaming window for spark in velox. Why need to set the sort based window in your use case? Can you share more about your use cases? Thanks.
Can you list such a query as example?
Any window query with following conf set to sort
--conf spark.gluten.sql.columnar.backend.velox.window.type=sort
for example (lineitem from tpch)
select row_number() over (partition by l_suppkey order by l_orderkey) from lineitem
When we set spark.gluten.sql.columnar.backend.velox.window.type=sort, velox will use sort based window builder, so there is no need to sort outside WindowExec operator, right?
cc @JkSelf @FelixYBW
In order to remove the duplicated sort operator, we implement the streaming window for spark in velox. Why need to set the sort based window in your use case? Can you share more about your use cases? Thanks.
@JkSelf We are just doing some investigating. Both SortExec and WindowExec need lots of C2R and R2C within the implemention, especially spill occured. It is hard to do something for Sort outside Window , but for Sort within Window ( aka sort based window), maybe we can eliminate some C2R and C2R.
Do you have any suggestion?
@WangGuangxin Sort based window is designed for presto backend. And the streaming based window is for spark to remove the sort. Can you use streaming window in your use case? And we also implement the Rowbased streaming window for row_number() function to avoid window oom in gluten. If you choose streaming window, the row streaming window will be applied for row_number() function. Can you have a try? Thanks.
In order to remove the duplicated sort operator, we implement the streaming window for spark in velox. Why need to set the sort based window in your use case? Can you share more about your use cases? Thanks.
@JkSelf We are just doing some investigating. Both
SortExecandWindowExecneed lots of C2R and R2C within the implemention, especially spill occured. It is hard to do something forSortoutsideWindow, but forSortwithinWindow( aka sort based window), maybe we can eliminate some C2R and C2R. Do you have any suggestion?
Why do you need C2R, R2C?