datafusion-ballista icon indicating copy to clipboard operation
datafusion-ballista copied to clipboard

Implement Python bindings for BallistaContext

Open andygrove opened this issue 3 years ago • 11 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. We have Python bindings for DataFusion's ExecutionContext. It would be good to also support Ballista's BallistaContext so that we can use Python to run distributed queries.

Describe the solution you'd like Probably something like this?

import ballista

ctx = ballista.BallistaContext
df = ctx.read_parquet(...)

Describe alternatives you've considered Another approach might be to have ballista be an optional feature of DataFusion and then enable new methods on the DataFusion ExecutionContext instead but that would probably result in tons of additional dependencies and blur the lines between DataFusion and Ballista and I think there is a strong case for DataFusion=lib/embedded and Ballista=distributed.

Additional context N/A

andygrove avatar Aug 11 '21 22:08 andygrove

@andygrove hope to work on this, if you don't have any other plan.

kination avatar Aug 15 '21 11:08 kination

Thank you @djKooks that would be great

andygrove avatar Aug 15 '21 17:08 andygrove

@andygrove Would it be okay to put like following?

...
ballista/
   - rust/
   - ui/
   - python/     <- create binding here
datafusion/
datafusion-cli/
...

or should I update current datafusion python binding inside existing python/ directory?

kination avatar Aug 16 '21 12:08 kination

Yes, I think that makes sense.

andygrove avatar Aug 16 '21 12:08 andygrove

I would be interested to hear what others think though. @alamb @Dandandan @jorgecarleitao @houqp do you have an opinion on this?

andygrove avatar Aug 16 '21 12:08 andygrove

I agree that this makes the most sense 👍

Out of curiosity, do the bindings come with the client and executors, or just the client?

jorgecarleitao avatar Aug 16 '21 13:08 jorgecarleitao

I agree having a separate python binding for Ballista in the location suggested by @djKooks in apache/arrow-ballista#15 makes sense to me

alamb avatar Aug 16 '21 21:08 alamb

@alamb @jorgecarleitao @andygrove thanks for suggestion 🙇

do the bindings come with the client and executors, or just the client?

I think it will be enough to do with client only in first step, but do you have any more suggestion?

kination avatar Aug 28 '21 00:08 kination

I do not have any more to add here -- since I don't use the python bindings myself I don't have a lot to offer with specifics

alamb avatar Aug 29 '21 09:08 alamb

@andygrove @alamb @jorgecarleitao thanks for comment. Will start implementation in following branch https://github.com/apache/arrow-datafusion/pull/988 (will request for review when ready 🙇 )

kination avatar Sep 11 '21 12:09 kination

Related to #58

nl5887 avatar Jun 06 '22 07:06 nl5887

This has now been implemented.

andygrove avatar Sep 11 '22 23:09 andygrove