[Feature] [Spark] Support Spark 4.0 (preview)
Search before asking
- [X] I searched in the issues and found nothing similar.
Motivation
Support Spark4.0 (preview1)
Solution
No response
Anything else?
No response
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
@ulysses-you we can discuss this here.
thank you @YannByron for the guide.
I looked at Spark 4.0.0-preview, the main challenge is scala2.13. Others like JDK17, inferface changes are not big issues.
For scala2.13, as far as I can see, Spark community paid a huge cost to support it and drop the scala2.12, and even for now there are some performance regression due to scala2.13, so I think it affects Paimon much.
For my self, I perfer to copy paimon-spark-common to a new module paimon-spark-4.0, so that we did not need to touch previous Spark version code. We can focus on the support with Spark 4.0.0 and higher version (may create paimon-spark-4-common if necessary).
cc @JingsongLi what do you think about?
You can use "com.thoughtworks.enableIf" for multi versions of scala
Hi @ulysses-you @YannByron , I would like to ask whether paimon-spark-4-common and paimon-spark-common can reuse most of the code. I believe Spark 3 has very long-term support, and we also need to support Spark 4. If we end up copying a lot of code in this process, it will result in maintaining two separate codebases, which can be very costly. Therefore, my concern is whether we can reuse a significant portion of the code.
maybe we can allow paimon-spark-common to support both scala.version and spark.version properties(scala-2.12 and spark 3.5.2 by default), that make paimon-spark-common compatible spark 3.5 and 4.x.Then provide a profile in top-level pom to compile paimon-spark.
This approach doesn't allow compile both spark 3.x and spark4.x at the same time and we have to modify something like CI. But this can avoid copying codes and make more reuse.
Meanwhile, paimon-spark3-common and paimon-spark4-common can be derived from paimon-spark-common easily if required.
@JingsongLi @ulysses-you WDYT~
The main issue of reuse module to me is we need to compile spark twice for different scala version. But I'm +1 for @YannByron if you are fine with it.
@YannByron This approach just like Flink with two scala versions. I am OK with it~