spark-kubernetes-operator
spark-kubernetes-operator copied to clipboard
[SPARK-48017] Add Spark application submission worker for operator
What changes were proposed in this pull request?
This is a breakdown PR of #2 - adding a submission worker implementation for SparkApplication.
Why are the changes needed?
Spark Operator needs a submission worker to convert its abstraction (the SparkApplication API) into k8s resource spec. This is a light-weight implementation based on native k8s integration.
As of now, it's based off Spark 4.0.0-preview1 - but it's assumed to serve all Spark LTS versions. This is feasible because as it aims to cover only the spec generation, Spark core jars are still brought-in by application images. E2Es would set up with operator later to ensure that.
Per SPIP doc, in future operator version(s) we may add more implementations for submission worker based on different Spark versions to achieve 100% version agnostic, at the cost of having multiple workers stand-by.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test coverage.
Was this patch authored or co-authored using generative AI tooling?
no
It seems that we need a shim layer or multiple module like Iceberg
Yes, that's one mid-term goal that we will target for operator v1.0, in order to achieve fully version agnostic.
This PR proposes single submission worker based on latest spark-kubernetes - consider it's history, we tested the compatibility with Spark 3.2, 3.3, 3.4, 3.5. We can do the same for 4.0 to ensure no breaking change is introduced. This is the pattern adopted by most operator solutions, like Flink operator / Google Spark operator. I'm not saying this is the absolutely right way to go for longer term, but it could enable the first batch of evaluations on operator 0.1 while we work on the multi-submission worker mode.
The challenges of multi-version submission worker mode involves
- the operator image can be heavy (packaging multiple Spark jars)
- runtime resource consumption can be higher, because we need multiple containers (per Spark version) to avoid jar conflicts in class path.
- deployment (helm chart) of operator can be a bit more complex when users are more familiar with operator. i.e., users might want to deploy operator with single submission worker mode, or a selection of Spark versions, or all known versions based on the need.
Given this can we start with this PR for v0.1 ?
Thanks. Let me consider more.
Thank you for renaming the package and updating LICENSE.
Thank you for updating.
I wrote the current status summary here.
- https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2120918277