PipelineDP
PipelineDP copied to clipboard
Spark 2.X.X support?
Question
Is there support of the 2.X.X versions of Apache Spark?
Further Information
I see in pyproject.toml pyspark 3.2.0 dependency. But in real enerprise and on-premise clusters typically version is 2.X.X. Is there support of any Spark version except 3.2.0?
Screenshots
If applicable, add screenshots to help explain your question.
System Information
- OS: RHEL
- OS Version: 8
- Language Version: 3.7
- Package Manager Version: PIP
Additional Context
It is good to see the list of supported Spark/Besm versions but I couldn't find it. Maybe there is one? In that case could you please get me a link? Thank you!
We haven't tested yet on 2.X, though I think it should be easy to make support 2.X (or even it might work with 2.X out of the box). That's because PipelineDP needs only some basic APIs from RDD
(no yet support of other Spark API as DataFrames) - like map
, reduceByKey
, join
etc. You can see all used Spark API in SparkRDDBackend class. If you have any feedback on using Spark please LMK. Also if you test it with Spark 2.* please LMK results.
In the next release, we will remove limitation on 3.2.0.
Thanks a lot for a such fast answer. I'll write a comment here about my tests on Spark 2.3.0.