datafusion
datafusion copied to clipboard
Apache DataFusion SQL Query Engine
## Which issue does this PR close? Closes #7955 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ## Are there...
### Is your feature request related to a problem or challenge? I would like to be able to fill_nulls per col/expr and on the dataframe level, akin to polars/pyspark/pandas ###...
Is there a better way we could do this? Maybe add something upstream if necessary? As I'm thinking of it, I don't know that this operation is necessarily well defined....
### Is your feature request related to a problem or challenge? Pivoting and unpivoting is a common use case for data scientists, this is currently missing in the DF api....
### Is your feature request related to a problem or challenge? The PR [#12667](https://github.com/apache/datafusion/pull/12667) effectively generates tests to catch the bug. However, since it uses ThreadRng to generate data, the...
Closes #16432 The idea here is to introduce a _global_ thresholds reference that gets updated across all partitions. This could drastically speed up early termination.
Part of #7955. My goal here is to lay the groundwork for pushing down joins. I am only implementing bounds pushdown because I am sure that is cheap and it...
## Which issue does this PR close? - Closes #16179 . ## Rationale for this change We can use `u32` indices instead of `u64` indices when there are less than...
## Which issue does this PR close? - Closes #. ## Rationale for this change We want to support equijoins in `NestedLoopJoin` in the case where one of the tables...
## Which issue does this PR close? - Closes #16353. ## Rationale for this change `RecordBatchStreamReceiver` supports cooperative scheduling implicitly by using Tokio's task budget. `YieldStream` currently uses a custom...