Andy Grove
Andy Grove
### What is the problem the feature request solves? DataFusion has an optimization where projections can be pushed down into hash joins. This implemented in the projection pushdown optimizer rule...
### What is the problem the feature request solves? We can probably accelerate reading of JSON files by continuing to use JVM Spark to read bytes from disk but then...
### Describe the bug I was experimenting with enabling `spark.comet.parquet.enable.directBuffer` and this happened: ``` Caused by: org.apache.spark.SparkException: Encountered error while reading file file:///mnt/bigdata/tpcds/sf100/inventory.parquet/part1.parquet. Details: at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotReadFilesError(QueryExecutionErrors.scala:877) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:307) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)...
Here is a part of `format_and_mount.sh`: ``` echo "Creating 500mb join datasets" Rscript ../_data/join-datagen.R 1e7 0 0 Rscript ../_data/join-datagen.R 1e7 5 0 Rscript ../_data/join-datagen.R 1e7 0 1 ``` This fails:...
## Which issue does this PR close? N/A ## Rationale for this change DataFusion Comet is currently maintaining a fork of FilterExec with a small modificiation to change the way...
### Is your feature request related to a problem or challenge? When running TPC-H q1 in Spark + DataFusion Comet, the expression `l_extendedprice#21 * (1 - l_discount#22)` appears twice in...