Andy Grove
Andy Grove
### What is the problem the feature request solves? I noticed that we execute each query stage with two separate native plans. For example, here is the first query stage...
### What is the problem the feature request solves? When running TPC-H q1 in Spark/Comet, the expression `l_extendedprice#21 * (1 - l_discount#22)` appears twice in the query and currently gets...
### What is the problem the feature request solves? We currently unpack dictionaries before SortExec, which seems inefficient. I experimented with removing this unpacking and was able to run TPC-H...
### What is the problem the feature request solves? The benchmarks added in https://github.com/apache/datafusion-comet/pull/948 show that Comet's Spark-compatible aggregates are ~50% slower than the DataFusion equivalents: ``` aggregate/avg_decimal_datafusion time: [653.56...
### Describe the bug When running TPC-DS benchmarks against 100 GB data set I see a large regression in performance. For example, here are the timings for q72 before and...
### What is the problem the feature request solves? Comet does not yet support DPP and this can result in poor performance on the TPC-DS benchmark due to scanning more...
### What is the problem the feature request solves? I am comparing native query plans between Comet and Ballista for TPC-H q1 and noticed a significant difference between the filter...
### What is the problem the feature request solves? I would like to change the `nodeName` for `CometSparkToColumnar` as follows: ```scala override def nodeName: String = if (child.supportsColumnar) { "CometSparkColumnarToColumnar"...
### What is the problem the feature request solves? We can probably accelerate reading of CSV files by continuing to use JVM Spark to read bytes from disk but then...
### What is the problem the feature request solves? In the native planner, we insert a CopyExec node into the plan. The timings from this operator are not visible in...