datafusion-comet
datafusion-comet copied to clipboard
Improve performance of standard deviation aggregate
What is the problem the feature request solves?
I am benchmarking with this query (based on TPC-DS q39):
select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy
,stddev_samp(inv_quantity_on_hand) stdev
from inventory
,item
,warehouse
,date_dim
where inv_item_sk = i_item_sk
and inv_warehouse_sk = w_warehouse_sk
and inv_date_sk = d_date_sk
and d_year =2001
group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy;
Results:
AMD Ryzen 9 7950X3D 16-Core Processor
TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
stddev 3295 3370 105 40.4 24.7 1.0X
stddev: Comet (Scan) 3343 3461 167 39.9 25.1 1.0X
stddev: Comet (Scan, Exec) 7621 7657 50 17.5 57.2 0.4X
If I change the query to use avg instead of stddev then I get much better results (ignore the benchmark name):
AMD Ryzen 9 7950X3D 16-Core Processor
TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
stddev 3187 3276 125 41.8 23.9 1.0X
stddev: Comet (Scan) 3128 3163 48 42.6 23.5 1.0X
stddev: Comet (Scan, Exec) 2209 2241 44 60.3 16.6 1.4X
Describe the potential solution
No response
Additional context
No response
I suspect that implementing the GroupsAccumulator api could resolve this. Took a stab at implementing it here: https://github.com/apache/datafusion/pull/12095 (issue: https://github.com/apache/datafusion/issues/12094) in datafusion. This can the easily be replicated in our version of the aggregates.
Thank you @eejbyfeldt