datafusion-ballista icon indicating copy to clipboard operation
datafusion-ballista copied to clipboard

[EPIC] Add support for Substrait

Open andygrove opened this issue 3 years ago • 1 comments

[EDIT: Updated this on 2/25/23]

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The substrait standard is gaining adoption and I would like to add support to Balllista. There are three different areas where we could potentially support Substrait:

  • [ ] ExecuteQueryParams currently accepts either LogicalPlan or a SQL string. We could add Substrait here as well, represented as a byte array. This would allow clients such as Ibis to submit queries directly to Ballista's gRPC service.
  • [ ] The executor currently receives tasks containing DataFusion physical plans. These plans could be serialized to Substrait and passed to other execution engines, such as DuckDB, Polars, and cuDF, making Ballista a general-purpose distributed query scheduler.
  • [ ] We currently use a proprietary protobuf format for representing plans in protobuf format. We could adopt Substrait here as well, or maybe just add a wrapper for Substrait plans.

Original description:

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Ballista (and DataFusion) has a proprietary protobuf-based format for serializing query plans. This really ties Ballista to DataFusion and does not allow other query engines and/or compute kernals to be used easily.

Describe the solution you'd like There is now an emerging standard for query plan serialization at https://substrait.io/ and this is also protobuf-based. It would be good to move towards this over time.

Describe alternatives you've considered None

Additional context None

andygrove avatar May 21 '22 18:05 andygrove

Substrait support is now in DataFusion, so I plan on working on this soon

andygrove avatar Jan 18 '23 01:01 andygrove