datafusion
datafusion copied to clipboard
Apache DataFusion SQL Query Engine
## Which issue does this PR close? - Closes #18320. ## Rationale for this change ## What changes are included in this PR? Adding a new rule to expr_simplifier library...
### Describe the bug DataFusion fails with schema mismatch error when processing UNION ALL query on parquet files with field metadata. `Error while planning query: Internal error: Physical input schema...
### Is your feature request related to a problem or challenge? When DF checks if our partitioning is satisfied or if a repartition is needed it only checks if the...
## Which issue does this PR close? - Closes #19272 ## Rationale for this change LogicalPlan::TableScan is currently treated as a leaf node in map_children, but some table providers (such...
### Is your feature request related to a problem or challenge? The current implementation of the `DefaultListFilesCache` stores and retrieves entries from the cache using a provided `Path` as the...
Superseeds https://github.com/apache/datafusion/pull/15865 Part of https://github.com/apache/datafusion/issues/16800 The idea here was to remove usage of `SchemaAdapter` and at the same time actually populate the partition column statistics.
### Describe the bug My understanding, which could be wrong, is that quoted field names should not be treated as placeholder variables when parsing SQL statements. For example: ``` SELECT...
### Describe the bug In `GroupedHashAggregateStream::spill_previous_if_necessary`, when the `group_ordering` is not `GroupOrdering::None`, spilling is currently not supported. In `GroupedHashAggregateStream::group_aggregate_batch`, there is code that ignores out of memory errors under the...
### Describe the bug I have a query predicate `left = right` where `left` is a column with type `Utf8View` and `right` is a column with type `Dictionary(UInt8, LargeUtf8)`. The...
This is a collection of items to improve external (spilling) aggregation ### Background > Abstract—Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates...