spark-kubernetes-operator icon indicating copy to clipboard operation
spark-kubernetes-operator copied to clipboard

[SPARK-48382]Add controller / reconciler module to operator

Open jiangzho opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

This PR adds the operator module for Spark operator, makes it able to operate and reconcile SparkApplications () in Kubernetes cluster, based on the resource spec built with submission worker.

The operator module includes the java main entry point to make it a k8s application, with built-in metrics support. It is built with the Java Operator SDK and uses the Native Kubernetes Integration for launching Spark applications and manage app lifecycle under the hood.

This PR also includes the Dockfile for packaging operator apps.

Why are the changes needed?

Combined with the API (CRD), this completes the Spark operator following k8s operator pattern. Operator acts as a control plane to manage the complete deployment lifecycle of Spark applications.

The lifecycle (state machine) for Spark application proposed in this PR can be described as:

flow

Does this PR introduce any user-facing change?

No to existing Spark users.

How was this patch tested?

Tested via unit test coverage, as well as local integration test flow with minikube.

To build operator image, use

docker build --build-arg BASE_VERSION=0.1.0 -t spark-kubernetes-operator:0.1.0 .

Deploy operator involves creating a k8s deployment, CRD, config map and a few RBAC resources. The scope of which would be covered by SPARK-48398 .

Was this patch authored or co-authored using generative AI tooling?

No

jiangzho avatar May 23 '24 06:05 jiangzho

With latest commit, the conf properties keys are prefix'ed woth spark.kubernetes.operator.. SparkOperatorConf includes below groups / namespaces:

group / namespace notes
spark.kubernetes.operator. general operator conf, e.g. name, namespace, watched namespaces
spark.kubernetes.operator.reconciler. reconciler behavior, e.g. parallelism, reconcile timeouts / interval, whether to enable history trimming . etc
spark.kubernetes.operator.dynamicConfig. dynamic config monitor behavior
spark.kubernetes.operator.rateLimiter. rate limiter behavior, interval, limits, multiplier . etc
spark.kubernetes.operator.retry. retry behavior, including client retry config and additional retry behavior on k8s server error / status patching
spark.kubernetes.operator.metrics. metrics behavior, e.g. whether to enable metrics sets for josdk / k8s client, metrics server port .etc
spark.kubernetes.operator.health. health check / probe behavior
spark.kubernetes.operator.leaderElection. leader election behavior for operator HA mode

jiangzho avatar May 23 '24 22:05 jiangzho

  • I removed Dockerfile and docker-entrypoint.sh and revised the PR description accordingly.
  • I copied configurations into the PR description.

dongjoon-hyun avatar Jun 12 '24 14:06 dongjoon-hyun

I finished this round review because there are too many. Could you address some? In addition SparkOperator.java seems to be untested at all in this PR. I guess we need to have a unit test coverage before adding back an Docker integration test.

BTW, there exists unaddressed existing comments too like the following.

  • https://github.com/apache/spark-kubernetes-operator/pull/12/files#r1662837069

dongjoon-hyun avatar Jul 15 '24 00:07 dongjoon-hyun

Thank you. Did you finish the updates, @jiangzho ? It seems that there are some un-addressed comments.

dongjoon-hyun avatar Jul 16 '24 17:07 dongjoon-hyun

  • I reviewed the previous comments and resolved when it's addressed. So, please go through the remaining open comments. We need to address them.
  • BTW, please re-consider to split this PR once more into architecturally smaller and independent ones.

dongjoon-hyun avatar Jul 16 '24 17:07 dongjoon-hyun

BTW, any idea for this, https://github.com/apache/spark-kubernetes-operator/pull/12#issuecomment-2231493319 , @jiangzho ?

BTW, please re-consider to split this PR once more into architecturally smaller and independent ones.

dongjoon-hyun avatar Jul 18 '24 15:07 dongjoon-hyun

please re-consider to split this PR once more into architecturally smaller and independent ones

Yep! As this is becoming larger browser might crash while iterating the comments. How about we start with one split PR with the util classes ?

jiangzho avatar Jul 19 '24 21:07 jiangzho

Sounds good. ~Like utils, can we spin off metrics seperately, @jiangzho ?~

dongjoon-hyun avatar Jul 19 '24 22:07 dongjoon-hyun

Never mind about metrics.

For utils PR, do we need to focus on the following dependent packages only ?

import org.apache.spark.k8s.operator.BaseResource;
import org.apache.spark.k8s.operator.config.SparkOperatorConf;
import org.apache.spark.k8s.operator.context.BaseContext;
import org.apache.spark.k8s.operator.listeners.BaseStatusListener;
import org.apache.spark.k8s.operator.status.BaseStatus;

dongjoon-hyun avatar Jul 19 '24 22:07 dongjoon-hyun

BTW, for example, IMO, we can narrow down like the following.

$ git diff main --stat
 spark-operator/build.gradle                                                                                     |  20 ++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/client/KubernetesClientFactory.java                  |  52 ++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java                        | 461 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/listeners/BaseStatusListener.java                    |  29 ++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/listeners/SparkAppStatusListener.java                |  27 +++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/JVMMetricSet.java                            |  74 +++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsService.java                          |  57 ++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystem.java                           | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystemFactory.java                    | 104 ++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/PrometheusPullModelHandler.java              |  78 ++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/KubernetesMetricsInterceptor.java     | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/OperatorJosdkMetrics.java             | 312 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/source/OperatorJvmSource.java                |  39 +++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/ClassLoadingUtils.java                         |  63 ++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/LoggingUtils.java                              |  64 +++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/PodPhase.java                                  |  58 +++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/PodUtils.java                                  |  98 ++++++++++++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/ProbeUtil.java                                 |  61 ++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/SparkAppStatusUtils.java                       |  73 ++++++++++++++++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/SparkExceptionUtils.java                       |  38 +++++++++++++++
 spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/Utils.java                                     |  42 ++++++++++++++++
 spark-operator/src/main/resources/EcsLayout.json                                                                |  49 +++++++++++++++++++
 spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/MetricsSystemFactoryTest.java                |  52 ++++++++++++++++++++
 spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/MetricsSystemTest.java                       |  77 ++++++++++++++++++++++++++++++
 spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/sink/MockSink.java                           |  69 +++++++++++++++++++++++++++
 spark-operator/src/test/java/org/apache/spark/k8s/operator/metrics/source/KubernetesMetricsInterceptorTest.java | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 spark-operator/src/test/java/org/apache/spark/k8s/operator/utils/TestUtils.java                                 |  63 ++++++++++++++++++++++++
 spark-operator/src/test/resources/log4j2.properties                                                             |  52 ++++++++++++++++++++
 28 files changed, 2583 insertions(+)

dongjoon-hyun avatar Jul 19 '24 23:07 dongjoon-hyun

Please make a new PR for SparkOperator.java because it's worth .

dongjoon-hyun avatar Jul 26 '24 15:07 dongjoon-hyun