spark-operator Client mode with multi version support

Spark operator currently supports two ways of running. One is via the master branch which runs the spark submit within the operator. This method is limiting because it only allows one version among the applications and gives the operator complete access to the dependencies needed by the applications.

The other method consists of using a k8s job to run the spark submit. This fixes the versioning limitation by allowing different versions to be run, but adds the cost of an extra k8s job that runs the spark submit. We decided to take the best of both worlds and run spark submit client mode in a separate pod without a k8s job.

We propose a change that will create a new pod on every application submission and run the driver in client mode. It will still support multiple versions and safe access, while also getting rid of the spark submit k8s job overhead when run in cluster mode.

Addition: Upon application submission, a new pod will be created with the driver running in client mode. Add driver host to spark conf for executors.

Jun 10 '21 17:06 michaelawilkins

Creating a link to related multi-version issue #610

Jun 12 '21 05:06 jkleckner

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 24 '24 12:09 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Oct 14 '24 14:10 github-actions[bot]

/reopen /lifecycle frozen

Feb 12 '25 11:02 ChenYi015

@ChenYi015: Reopened this issue.

In response to this:

/reopen /lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 12 '25 11:02 google-oss-prow[bot]

We decided to take the best of both worlds and run spark submit client mode in a separate pod without a k8s job.

a k8s job just starts a pod so seems like you're just saving a logical k8s resource. Is there some other fundamental difference I'm missing?

Oct 22 '25 00:10 matschaffer-roblox

@ChenYi015 is there any updated status on this? It looks like https://github.com/kubeflow/spark-operator/issues/610 was tracking the https://github.com/kubeflow/spark-operator/tree/multi-version-support branch but work stopped in May 2021.

I'm curious about this now because I'm attempting to support Spark 4.0.1 & Spark 3.4 on a fleet of EKS clusters (IRSA) running spark-operator on a Spark 3.5 image.

The spark-submit signature hasn't changed, but the hadoop credential classes (hadoop 3.3 -> 3.4 means aws sdk v1 -> v2) have.

So now my spark-submit (3.5) and spark app (4.0.1) require slightly different credential provider settings. Concretely com.amazonaws.auth.WebIdentityTokenCredentialsProvider becomes software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider because hadoop-aws doesn't put IRSA in the default provider chain.

For now it looks like I can patch the spark image entrypoint to override the setting, but I'm curious if there's any broader cross-version support plan for kubeflow's spark-operator.

Thanks in advance for any info!

Oct 22 '25 00:10 matschaffer-roblox

spark-operator spark-operator copied to clipboard

Client mode with multi version support

spark-operator
spark-operator copied to clipboard