vega Maybe use DataFusion and Apache Arrow as building blocks ?

Maybe use DataFusion and Apache Arrow as building blocks ?

Open constantOut opened this issue 4 years ago • 1 comments

There is a competing project called https://github.com/ballista-compute/ballista It is using DataFusion, I don't quite get it why Ballista examples include weird syntax for querying. I understand that distributed SQL execution is more complex then just combining results from individual executors, but I think having single-node SQL engine would be of a great help. What do you think ?

May 15 '20 15:05 constantOut

I have plans of integrating with Python and possibly other languages(JVM and Go) using Arrow. However, regarding datafusion, the underlying architecture of this framework closely follows that of Spark and the job execution is quite a bit different than that of Datafusion. So, unfortunately we can’t use it. Andy Grove built ballista, a distributed framework around datafusion which is an interesting project to have a look at.

May 30 '20 16:05 rajasekarv

vega vega copied to clipboard

Maybe use DataFusion and Apache Arrow as building blocks ?

vega
vega copied to clipboard