vega
vega copied to clipboard
Maybe use DataFusion and Apache Arrow as building blocks ?
There is a competing project called https://github.com/ballista-compute/ballista It is using DataFusion, I don't quite get it why Ballista examples include weird syntax for querying. I understand that distributed SQL execution is more complex then just combining results from individual executors, but I think having single-node SQL engine would be of a great help. What do you think ?
I have plans of integrating with Python and possibly other languages(JVM and Go) using Arrow. However, regarding datafusion, the underlying architecture of this framework closely follows that of Spark and the job execution is quite a bit different than that of Datafusion. So, unfortunately we can’t use it. Andy Grove built ballista, a distributed framework around datafusion which is an interesting project to have a look at.