datafusion-ballista
datafusion-ballista copied to clipboard
Implement Python bindings for BallistaContext
Is your feature request related to a problem or challenge? Please describe what you are trying to do. We have Python bindings for DataFusion's ExecutionContext. It would be good to also support Ballista's BallistaContext so that we can use Python to run distributed queries.
Describe the solution you'd like Probably something like this?
import ballista
ctx = ballista.BallistaContext
df = ctx.read_parquet(...)
Describe alternatives you've considered Another approach might be to have ballista be an optional feature of DataFusion and then enable new methods on the DataFusion ExecutionContext instead but that would probably result in tons of additional dependencies and blur the lines between DataFusion and Ballista and I think there is a strong case for DataFusion=lib/embedded and Ballista=distributed.
Additional context N/A
@andygrove hope to work on this, if you don't have any other plan.
Thank you @djKooks that would be great
@andygrove Would it be okay to put like following?
...
ballista/
- rust/
- ui/
- python/ <- create binding here
datafusion/
datafusion-cli/
...
or should I update current datafusion python binding inside existing python/
directory?
Yes, I think that makes sense.
I would be interested to hear what others think though. @alamb @Dandandan @jorgecarleitao @houqp do you have an opinion on this?
I agree that this makes the most sense 👍
Out of curiosity, do the bindings come with the client and executors, or just the client?
I agree having a separate python binding for Ballista in the location suggested by @djKooks in apache/arrow-ballista#15 makes sense to me
@alamb @jorgecarleitao @andygrove thanks for suggestion 🙇
do the bindings come with the client and executors, or just the client?
I think it will be enough to do with client only in first step, but do you have any more suggestion?
I do not have any more to add here -- since I don't use the python bindings myself I don't have a lot to offer with specifics
@andygrove @alamb @jorgecarleitao thanks for comment. Will start implementation in following branch https://github.com/apache/arrow-datafusion/pull/988 (will request for review when ready 🙇 )
Related to #58
This has now been implemented.