spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? Coalesce paritition for every group ### Why are the changes needed? With CartesianProduct, CoalesceShufflePartitions can not optimize it. Such as sql...

SQL

### What changes were proposed in this pull request? This proposes to add support for ArrayType of nested StructType to arrow-based conversion. This allows Pandas UDFs, mapInArrow UDFs, and toPandas...

SQL
CORE
PYTHON

### What changes were proposed in this pull request? If all group expressions are foldable, the result of this aggregate will always be OneRowRelation. And if all aggregate expressions are...

SQL

### What changes were proposed in this pull request? The same `UnsupportedOperationException` is constructed in 3 places in `AccumulatorV2`, this pr extract an helper method to deduplicate code. ### Why...

CORE

### What changes were proposed in this pull request? - Add a new mix in interface `SupportsReportDistinctKeys` for datasource v2 - Add a new method `reportDistinctKeysSet` in `LeafNode` - Override...

SQL

### What changes were proposed in this pull request? Fix the problem of writing hive partition table without updating metadata information ### Why are the changes needed? - This patch...

SQL

### What changes were proposed in this pull request? Should throw the error if caused by Filesystem closed when we enable ignoreCorruptFiles ### Why are the changes needed? If we...

SQL

…onary encoding#38699 ### What changes were proposed in this pull request? Migrate the following errors in QueryExecutionErrors: useDictionaryEncodingWhenDictionaryOverflowError -> DICTIONARY_OVERFLOW_ERROR ### Why are the changes needed? Porting execution errors of...

SQL
CORE

### What changes were proposed in this pull request? If boolean expression have two similar binary comparisons have the same symbol(e.g., >) and one side is literal and connected with...

SQL

### What changes were proposed in this pull request? KafkaSink Metrics feature is added ### Why are the changes needed? KafkaSink Metrics feature will be useful to collect and store...

SQL
STRUCTURED STREAMING