dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[DF] Add DISTRIBUTE BY to DataFusion Port

Open jdye64 opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. Previously with Calcite we had created some custom syntax logic to allow for the DISTRIBUTE BY clause. We need to recreate that logic so queries using DataFusion can also use DISTRIBUTE BY

Describe the solution you'd like DISTRIBUTE BY clause working in all SQL queries.

jdye64 avatar May 16 '22 18:05 jdye64

Here is Spark's documentation for this: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-distribute-by.html

We need to figure out where to add this. DataFusion can already parse this SQL but ignores it during planning. We probably need to represent this with an extra operator in the plan:

- Distribute
  - Projection

andygrove avatar Aug 18 '22 18:08 andygrove

DataFusion PR: https://github.com/apache/arrow-datafusion/pull/3208

andygrove avatar Aug 19 '22 12:08 andygrove