Andy Grove issues

Results 438 issues of


                                            Andy Grove

Possible native shuffle optimization

### What is the problem the feature request solves? I noticed that we execute each query stage with two separate native plans. For example, here is the first query stage...

enhancement

performance

Implement Common Subexpression Elimination optimizer rule

### What is the problem the feature request solves? When running TPC-H q1 in Spark/Comet, the expression `l_extendedprice#21 * (1 - l_discount#22)` appears twice in the query and currently gets...

enhancement

performance

Avoid unpacking dictionaries for inputs to SortExec

### What is the problem the feature request solves? We currently unpack dictionaries before SortExec, which seems inefficient. I experimented with removing this unpacking and was able to run TPC-H...

enhancement

performance

Improve performance of Spark-compatible decimal aggregates

### What is the problem the feature request solves? The benchmarks added in https://github.com/apache/datafusion-comet/pull/948 show that Comet's Spark-compatible aggregates are ~50% slower than the DataFusion equivalents: ``` aggregate/avg_decimal_datafusion time: [653.56...

enhancement

performance

Performance regression after adding support for SMJ with join filter

### Describe the bug When running TPC-DS benchmarks against 100 GB data set I see a large regression in performance. For example, here are the timings for q72 before and...

bug

performance

Fall back to Spark if query uses DPP to avoid perf regressions in TPC-DS

### What is the problem the feature request solves? Comet does not yet support DPP and this can result in poor performance on the TPC-DS benchmark due to scanning more...

enhancement

performance

Optimize filters to remove redundant IsNotNull checks

### What is the problem the feature request solves? I am comparing native query plans between Comet and Ballista for TPC-H q1 and noticed a significant difference between the filter...

enhancement

performance

CometSparkToColumnar should have different name for row vs columnar input

### What is the problem the feature request solves? I would like to change the `nodeName` for `CometSparkToColumnar` as follows: ```scala override def nodeName: String = if (child.supportsColumnar) { "CometSparkColumnarToColumnar"...

enhancement

good first issue

Implement native parsing of CSV files

### What is the problem the feature request solves? We can probably accelerate reading of CSV files by continuing to use JVM Spark to read bytes from disk but then...

enhancement

performance

Expose CopyExec metrics in Spark

### What is the problem the feature request solves? In the native planner, we insert a CopyExec node into the plan. The timings from this operator are not visible in...

enhancement

performance