spark-kubernetes-operator
spark-kubernetes-operator copied to clipboard
[SPARK-48382]Add controller / reconciler module to operator
What changes were proposed in this pull request?
This PR adds the operator module for Spark operator, makes it able to operate and reconcile SparkApplications () in Kubernetes cluster, based on the resource spec built with submission worker.
The operator module includes the java main entry point to make it a k8s application, with built-in metrics support. It is built with the Java Operator SDK and uses the Native Kubernetes Integration for launching Spark applications and manage app lifecycle under the hood.
This PR also includes the Dockfile for packaging operator apps.
Why are the changes needed?
Combined with the API (CRD), this completes the Spark operator following k8s operator pattern. Operator acts as a control plane to manage the complete deployment lifecycle of Spark applications.
The lifecycle (state machine) for Spark application proposed in this PR can be described as:
Does this PR introduce any user-facing change?
No to existing Spark users.
How was this patch tested?
Tested via unit test coverage, as well as local integration test flow with minikube.
To build operator image, use
docker build --build-arg BASE_VERSION=0.1.0 -t spark-kubernetes-operator:0.1.0 .
Deploy operator involves creating a k8s deployment, CRD, config map and a few RBAC resources. The scope of which would be covered by SPARK-48398 .
Was this patch authored or co-authored using generative AI tooling?
No
With latest commit, the conf properties keys are prefix'ed woth spark.kubernetes.operator.. SparkOperatorConf includes below groups / namespaces:
| group / namespace | notes |
|---|---|
| spark.kubernetes.operator. | general operator conf, e.g. name, namespace, watched namespaces |
| spark.kubernetes.operator.reconciler. | reconciler behavior, e.g. parallelism, reconcile timeouts / interval, whether to enable history trimming . etc |
| spark.kubernetes.operator.dynamicConfig. | dynamic config monitor behavior |
| spark.kubernetes.operator.rateLimiter. | rate limiter behavior, interval, limits, multiplier . etc |
| spark.kubernetes.operator.retry. | retry behavior, including client retry config and additional retry behavior on k8s server error / status patching |
| spark.kubernetes.operator.metrics. | metrics behavior, e.g. whether to enable metrics sets for josdk / k8s client, metrics server port .etc |
| spark.kubernetes.operator.health. | health check / probe behavior |
| spark.kubernetes.operator.leaderElection. | leader election behavior for operator HA mode |
- I removed
Dockerfileanddocker-entrypoint.shand revised the PR description accordingly. - I copied configurations into the PR description.
I finished this round review because there are too many. Could you address some? In addition SparkOperator.java seems to be untested at all in this PR. I guess we need to have a unit test coverage before adding back an Docker integration test.
BTW, there exists unaddressed existing comments too like the following.
- https://github.com/apache/spark-kubernetes-operator/pull/12/files#r1662837069
Thank you. Did you finish the updates, @jiangzho ? It seems that there are some un-addressed comments.
- I reviewed the previous comments and resolved when it's addressed. So, please go through the remaining open comments. We need to address them.
- BTW, please re-consider to split this PR once more into architecturally smaller and independent ones.
BTW, any idea for this, https://github.com/apache/spark-kubernetes-operator/pull/12#issuecomment-2231493319 , @jiangzho ?
BTW, please re-consider to split this PR once more into architecturally smaller and independent ones.
please re-consider to split this PR once more into architecturally smaller and independent ones
Yep! As this is becoming larger browser might crash while iterating the comments. How about we start with one split PR with the util classes ?
Sounds good. ~Like utils, can we spin off metrics seperately, @jiangzho ?~
Never mind about metrics.
For utils PR, do we need to focus on the following dependent packages only ?
import org.apache.spark.k8s.operator.BaseResource;
import org.apache.spark.k8s.operator.config.SparkOperatorConf;
import org.apache.spark.k8s.operator.context.BaseContext;
import org.apache.spark.k8s.operator.listeners.BaseStatusListener;
import org.apache.spark.k8s.operator.status.BaseStatus;
BTW, for example, IMO, we can narrow down like the following.
$ git diff main --stat
spark-operator/build.gradle | 20 ++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/client/KubernetesClientFactory.java | 52 ++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/listeners/BaseStatusListener.java | 29 ++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/listeners/SparkAppStatusListener.java | 27 +++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/JVMMetricSet.java | 74 +++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsService.java | 57 ++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystem.java | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystemFactory.java | 104 ++++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/PrometheusPullModelHandler.java | 78 ++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/KubernetesMetricsInterceptor.java | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/OperatorJosdkMetrics.java | 312 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/OperatorJvmSource.java | 39 +++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/ClassLoadingUtils.java | 63 ++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/LoggingUtils.java | 64 +++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/PodPhase.java | 58 +++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/PodUtils.java | 98 ++++++++++++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/ProbeUtil.java | 61 ++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/SparkAppStatusUtils.java | 73 ++++++++++++++++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/SparkExceptionUtils.java | 38 +++++++++++++++
spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/Utils.java | 42 ++++++++++++++++
spark-operator/src/main/resources/EcsLayout.json | 49 +++++++++++++++++++
spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/MetricsSystemFactoryTest.java | 52 ++++++++++++++++++++
spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/MetricsSystemTest.java | 77 ++++++++++++++++++++++++++++++
spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/sink/MockSink.java | 69 +++++++++++++++++++++++++++
spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/source/KubernetesMetricsInterceptorTest.java | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
spark-operator/src/test/java/org/apache/spark/k8s/operator/utils/TestUtils.java | 63 ++++++++++++++++++++++++
spark-operator/src/test/resources/log4j2.properties | 52 ++++++++++++++++++++
28 files changed, 2583 insertions(+)
Please make a new PR for SparkOperator.java because it's worth .