Andy Grove comments

Results 657 comments of


                                            Andy Grove

WIP: Fix performance regression with `stddev` being enabled by default

There is an upstream PR in DataFusion to improve stddev performance https://github.com/apache/datafusion/pull/12095

WIP: Fix performance regression with `stddev` being enabled by default

I am closing this issue because we can disable this expression via a config now that https://github.com/apache/datafusion-comet/pull/855 is merged

Improve performance of broadcast hash join

The `FilterExec` in the above example is even more expensive than the `HashJoinExec`. Evaluating the predicate is cheap but copying data to the filtered batch takes 99% of the time....

Improve performance of broadcast hash join

The filter on the probe input is very simple (`col_0@0 IS NOT NULL`) and it should be possible to push down to the parquet scan? edit: we do push the...

Improve performance of broadcast hash join

Latest results after merging https://github.com/apache/datafusion-comet/pull/835 ## sf 10 ``` AMD Ryzen 9 7950X3D 16-Core Processor TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ join_inner 98...

Implement native version of ColumnarToRow

I created a Google document to discuss the design. https://docs.google.com/document/d/1zNuavf_WT3IcpeTVAEC8IjMGloi1MeAeQSg4F0eEivs/edit?usp=sharing

Implement native version of ColumnarToRow

I created https://github.com/apache/datafusion-comet/issues/837 for the very first step in this process

Implement native version of ColumnarToRow

Here is a like for like comparison between Spark and Comet for Scan+C2R for the optimized version of q72. ## Spark Spark C2R takes 1.1 minutes ![Screenshot from 2024-08-22 05-15-34](https://github.com/user-attachments/assets/373a1f10-2535-431d-b9a5-c7f4a9aad7f9)...

Implement native version of ColumnarToRow

There is some spill happening, even though I am allocating 20g memory overhead: ![Screenshot from 2024-08-22 07-52-14](https://github.com/user-attachments/assets/25061ede-f15d-497a-a6f2-be45ee41840b)

Implement native version of ColumnarToRow

@parthchandra I know that you are working on this so I tried assigning the issue to you, but your name does not show up for me for some reason. I...