paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] [Spark] Support Spark 4.0 (preview)

Open YannByron opened this issue 1 year ago • 7 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Motivation

Support Spark4.0 (preview1)

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

YannByron avatar Aug 12 '24 08:08 YannByron

@ulysses-you we can discuss this here.

YannByron avatar Aug 15 '24 07:08 YannByron

thank you @YannByron for the guide.

I looked at Spark 4.0.0-preview, the main challenge is scala2.13. Others like JDK17, inferface changes are not big issues.

For scala2.13, as far as I can see, Spark community paid a huge cost to support it and drop the scala2.12, and even for now there are some performance regression due to scala2.13, so I think it affects Paimon much.

For my self, I perfer to copy paimon-spark-common to a new module paimon-spark-4.0, so that we did not need to touch previous Spark version code. We can focus on the support with Spark 4.0.0 and higher version (may create paimon-spark-4-common if necessary).

cc @JingsongLi what do you think about?

ulysses-you avatar Aug 15 '24 08:08 ulysses-you

You can use "com.thoughtworks.enableIf" for multi versions of scala

awol2005ex avatar Aug 23 '24 03:08 awol2005ex

Hi @ulysses-you @YannByron , I would like to ask whether paimon-spark-4-common and paimon-spark-common can reuse most of the code. I believe Spark 3 has very long-term support, and we also need to support Spark 4. If we end up copying a lot of code in this process, it will result in maintaining two separate codebases, which can be very costly. Therefore, my concern is whether we can reuse a significant portion of the code.

JingsongLi avatar Sep 11 '24 03:09 JingsongLi

maybe we can allow paimon-spark-common to support both scala.version and spark.version properties(scala-2.12 and spark 3.5.2 by default), that make paimon-spark-common compatible spark 3.5 and 4.x.Then provide a profile in top-level pom to compile paimon-spark.

This approach doesn't allow compile both spark 3.x and spark4.x at the same time and we have to modify something like CI. But this can avoid copying codes and make more reuse.

Meanwhile, paimon-spark3-common and paimon-spark4-common can be derived from paimon-spark-common easily if required.

@JingsongLi @ulysses-you WDYT~

YannByron avatar Sep 11 '24 06:09 YannByron

The main issue of reuse module to me is we need to compile spark twice for different scala version. But I'm +1 for @YannByron if you are fine with it.

ulysses-you avatar Sep 12 '24 09:09 ulysses-you

@YannByron This approach just like Flink with two scala versions. I am OK with it~

JingsongLi avatar Sep 13 '24 05:09 JingsongLi