velox icon indicating copy to clipboard operation
velox copied to clipboard

[Discuss] Flink on Velox supports

Open lsyldliu opened this issue 3 years ago • 8 comments

Hi, guys

I'm focusing on flink sql in flink community, now we are investigate the native vectorized execution solution of batch sql. We found that velox is an universal open vectorized execution plugin, it can support presto and spark now. Moreover, Flink is very similar to Spark, they are both rpc-based and JVM-based compute engine, also the overall architecture is similar, so we think flink on velox is workable. I want to ask whether velox community has interest to support flink? Maybe there are some effort we can do together.

lsyldliu avatar Jun 02 '22 03:06 lsyldliu

Hi @lsyldliu, thanks for reaching out! As we work on expanding Velox's reach and growing our open soruce community, we are certainly interested in providing integrations with other widely adopted big data systems such as Flink.

Some of our close partners from Intel (cc: @davecohn) have expressed interest in the past in having an open source stream processing Velox integration available, so his team might be interested in helping. In addition, the Velox to Spark integration recently created by Intel (called Gluten - https://github.com/oap-project/gluten/) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.

Let me know how we can help. The community usually communicate via Slack, so if you send me your work email me or @jijufb can send you an invite and we could pick up from there?

pedroerp avatar Jun 03 '22 01:06 pedroerp

Hi @lsyldliu, please send your work email to [email protected] and I can send you invite to access Velox Slack channel.

jijufb avatar Jun 03 '22 02:06 jijufb

In addition, the Velox to Spark integration recently created by Intel (called Gluten - oap-project/gluten) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.

@lsyldliu Yes, we definitely can reuse the native part for Spark and Flink, maybe some JVM part as well. You may ping me (@Binwei Yang) in Slack channel and we can have a sync offline.

FelixYBW avatar Jun 03 '22 16:06 FelixYBW

Hi @lsyldliu, thanks for reaching out! As we work on expanding Velox's reach and growing our open soruce community, we are certainly interested in providing integrations with other widely adopted big data systems such as Flink.

Some of our close partners from Intel (cc: @davecohn) have expressed interest in the past in having an open source stream processing Velox integration available, so his team might be interested in helping. In addition, the Velox to Spark integration recently created by Intel (called Gluten - https://github.com/oap-project/gluten/) has a similar JVM/JNI integration mode, so maybe some of that work could be reused.

Let me know how we can help. The community usually communicate via Slack, so if you send me your work email me or @jijufb can send you an invite and we could pick up from there?

Thanks for your attention, I'm very glad to hear that velox community is interested in integration with flink. Due to I'm not familiar with velox currently, so I want to get some help from community about how integrate velox with flink, it seems that Gluten is very good to me. I will study it first.

lsyldliu avatar Jun 04 '22 03:06 lsyldliu

Hi @lsyldliu, please send your work email to [email protected] and I can send you invite to access Velox Slack channel.

@jijufb Thank you very much, I'm from alibaba, my work email is [email protected].

lsyldliu avatar Jun 04 '22 03:06 lsyldliu

@FelixYBW Okay, it sounds like good news to me.

lsyldliu avatar Jun 04 '22 03:06 lsyldliu

Hi @lsyldliu @FelixYBW, nice to see the discussion here. Are there any ongoing work around the Velox intergrated with Flink? We also try to explore the vector execution engine to accelerate the Flink batch/streaming sql

Aitozi avatar Sep 10 '22 15:09 Aitozi

Nothing from my side. @lsyldliu Did you started the work? You may take a look of this PR in substriat: https://github.com/substrait-io/substrait-java/pull/90. It converts spark's logic plan into substrait plan. Currently Gluten used different way to create the substrait plan. We are checking if we can use the same way here.

The biggest issue is spark plan isn't 1:1 mapping to velox plan. In Gluten when we create substrait plan in Gluten, we customized the generated substrait plan to make sure it's 1:1 mapping between substrait plan and Velox's plan.

image

FelixYBW avatar Oct 18 '22 18:10 FelixYBW