dask-sql
dask-sql copied to clipboard
Distributed SQL Engine in Python using Dask
**Report needed documentation** We will likely need to implement more optimization rules, either in DataFusion, or in Dask SQL. There is no documentation currently on how to do this. **Describe...
Closes #839. In addition to #832, we want to create a custom implementation for Dask-ML's `Incremental` class as well. So as not to create any merge conflicts, I've only added...
**Is your feature request related to a problem? Please describe.** After #813 is resolved, we should also add support for the `TIMESTAMPDIFF` function, as described [here](https://www.w3resource.com/mysql/date-and-time-functions/mysql-timestampdiff-function.php). **Describe the solution you'd...
Closes #831.
As Dask [adds support for gathering dataset statistical metadata](https://github.com/dask/dask/pull/9473), Dask-SQL should look into passing these stats into DataFusion's query planner/optimizer, which should result in more efficient query plans.
**Is your feature request related to a problem? Please describe.** Implement a upstream testing workflow similar to the one we have for testing dask upstream changes and the impact it...
As part of our initiative to move away from Dask-ML, we should replace the `dask_ml.wrappers.Incremental` class with our own custom class. This implementation should be very similar to the `ParallelPostFit`...
**Is your feature request related to a problem? Please describe.** The pipit release workflow was disabled in #777 since with the move the datafusion we now have to release arch...
Following on from https://github.com/dask-contrib/dask-sql/pull/803, this PR demonstrates how we can use a macro to make some of the boilerplate code more concise. More generally, this PR allows us to discuss...