spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? Coalesce paritition for every group ### Why are the changes needed? With CartesianProduct, CoalesceShufflePartitions can not optimize it. Such as sql...
### What changes were proposed in this pull request? This proposes to add support for ArrayType of nested StructType to arrow-based conversion. This allows Pandas UDFs, mapInArrow UDFs, and toPandas...
### What changes were proposed in this pull request? If all group expressions are foldable, the result of this aggregate will always be OneRowRelation. And if all aggregate expressions are...
### What changes were proposed in this pull request? The same `UnsupportedOperationException` is constructed in 3 places in `AccumulatorV2`, this pr extract an helper method to deduplicate code. ### Why...
### What changes were proposed in this pull request? - Add a new mix in interface `SupportsReportDistinctKeys` for datasource v2 - Add a new method `reportDistinctKeysSet` in `LeafNode` - Override...
### What changes were proposed in this pull request? Fix the problem of writing hive partition table without updating metadata information ### Why are the changes needed? - This patch...
### What changes were proposed in this pull request? Should throw the error if caused by Filesystem closed when we enable ignoreCorruptFiles ### Why are the changes needed? If we...
…onary encoding#38699 ### What changes were proposed in this pull request? Migrate the following errors in QueryExecutionErrors: useDictionaryEncodingWhenDictionaryOverflowError -> DICTIONARY_OVERFLOW_ERROR ### Why are the changes needed? Porting execution errors of...
### What changes were proposed in this pull request? If boolean expression have two similar binary comparisons have the same symbol(e.g., >) and one side is literal and connected with...
### What changes were proposed in this pull request? KafkaSink Metrics feature is added ### Why are the changes needed? KafkaSink Metrics feature will be useful to collect and store...