velox
velox copied to clipboard
[Discuss] Flink on Velox supports
Hi, guys
I'm focusing on flink sql in flink community, now we are investigate the native vectorized execution solution of batch sql. We found that velox is an universal open vectorized execution plugin, it can support presto and spark now. Moreover, Flink is very similar to Spark, they are both rpc-based and JVM-based compute engine, also the overall architecture is similar, so we think flink on velox is workable. I want to ask whether velox community has interest to support flink? Maybe there are some effort we can do together.
Hi @lsyldliu, thanks for reaching out! As we work on expanding Velox's reach and growing our open soruce community, we are certainly interested in providing integrations with other widely adopted big data systems such as Flink.
Some of our close partners from Intel (cc: @davecohn) have expressed interest in the past in having an open source stream processing Velox integration available, so his team might be interested in helping. In addition, the Velox to Spark integration recently created by Intel (called Gluten - https://github.com/oap-project/gluten/) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.
Let me know how we can help. The community usually communicate via Slack, so if you send me your work email me or @jijufb can send you an invite and we could pick up from there?
Hi @lsyldliu, please send your work email to [email protected] and I can send you invite to access Velox Slack channel.
In addition, the Velox to Spark integration recently created by Intel (called Gluten - oap-project/gluten) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.
@lsyldliu Yes, we definitely can reuse the native part for Spark and Flink, maybe some JVM part as well. You may ping me (@Binwei Yang) in Slack channel and we can have a sync offline.
Hi @lsyldliu, thanks for reaching out! As we work on expanding Velox's reach and growing our open soruce community, we are certainly interested in providing integrations with other widely adopted big data systems such as Flink.
Some of our close partners from Intel (cc: @davecohn) have expressed interest in the past in having an open source stream processing Velox integration available, so his team might be interested in helping. In addition, the Velox to Spark integration recently created by Intel (called Gluten - https://github.com/oap-project/gluten/) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.
Let me know how we can help. The community usually communicate via Slack, so if you send me your work email me or @jijufb can send you an invite and we could pick up from there?
Thanks for your attention, I'm very glad to hear that velox community is interested in integration with flink. Due to I'm not familiar with velox currently, so I want to get some help from community about how integrate velox with flink, it seems that Gluten is very good to me. I will study it first.
Hi @lsyldliu, please send your work email to [email protected] and I can send you invite to access Velox Slack channel.
@jijufb Thank you very much, I'm from alibaba, my work email is [email protected].
@FelixYBW Okay, it sounds like good news to me.
Hi @lsyldliu @FelixYBW, nice to see the discussion here. Are there any ongoing work around the Velox intergrated with Flink? We also try to explore the vector execution engine to accelerate the Flink batch/streaming sql
Nothing from my side. @lsyldliu Did you started the work? You may take a look of this PR in substriat: https://github.com/substrait-io/substrait-java/pull/90. It converts spark's logic plan into substrait plan. Currently Gluten used different way to create the substrait plan. We are checking if we can use the same way here.
The biggest issue is spark plan isn't 1:1 mapping to velox plan. In Gluten when we create substrait plan in Gluten, we customized the generated substrait plan to make sure it's 1:1 mapping between substrait plan and Velox's plan.
