Andy Grove
Andy Grove
There is an upstream PR in DataFusion to improve stddev performance https://github.com/apache/datafusion/pull/12095
I am closing this issue because we can disable this expression via a config now that https://github.com/apache/datafusion-comet/pull/855 is merged
The `FilterExec` in the above example is even more expensive than the `HashJoinExec`. Evaluating the predicate is cheap but copying data to the filtered batch takes 99% of the time....
The filter on the probe input is very simple (`col_0@0 IS NOT NULL`) and it should be possible to push down to the parquet scan? edit: we do push the...
Latest results after merging https://github.com/apache/datafusion-comet/pull/835 ## sf 10 ``` AMD Ryzen 9 7950X3D 16-Core Processor TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ join_inner 98...
I created a Google document to discuss the design. https://docs.google.com/document/d/1zNuavf_WT3IcpeTVAEC8IjMGloi1MeAeQSg4F0eEivs/edit?usp=sharing
I created https://github.com/apache/datafusion-comet/issues/837 for the very first step in this process
Here is a like for like comparison between Spark and Comet for Scan+C2R for the optimized version of q72. ## Spark Spark C2R takes 1.1 minutes ...
There is some spill happening, even though I am allocating 20g memory overhead: 
@parthchandra I know that you are working on this so I tried assigning the issue to you, but your name does not show up for me for some reason. I...