spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48669][K8S] K8s resource name prefix follows `DNS Subdomain Names` rule

Open jshmchenxi opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

This PR aims to support jobs with long spark.app.name in K8s. The resource name prefix should be truncated for the resource names to follow DNS Subdomain Names.

The current used resource suffixes are as follows:

  • -driver
  • -driver-podspec-conf-map
  • -driver-pvc-$i
  • -driver-svc
  • -exec-${executorId}
  • -exec-${executorId}-pvc-$i
  • -hadoop-config
  • -kubernetes-credentials
  • -delegation-tokens
  • -kerberos-keytab
  • -krb5-file

Among them, the longest one is -driver-podspec-conf-map of length 24. The max length of -exec-${executorId}-pvc-$i is also 24, as the max length of executorId is 10 (length of Integer.MAX_VALUE) and the max allowed PVC specs is 128 of length 3.

Why are the changes needed?

Currently, when a job with long spark.app.name is submitted, K8s will reject the creation of driver pod due to the pod name is exceeded 253.

Error example:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is invalid: metadata.name: Invalid value: "some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more than 253 characters. 

Does this PR introduce any user-facing change?

Yes, users can run jobs on K8s with longer spark.app.name.

How was this patch tested?

Pass the CIs with the updated unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

jshmchenxi avatar Jun 20 '24 09:06 jshmchenxi

Kindly ping @dongjoon-hyun as this is a continuation of SPARK-39614

jshmchenxi avatar Jun 20 '24 14:06 jshmchenxi

cc @pan3793 @yaooqinn @LuciferYang Please take a look, thanks!

jshmchenxi avatar Jun 28 '24 06:06 jshmchenxi

cc @dongjoon-hyun FYI

LuciferYang avatar Jul 01 '24 05:07 LuciferYang

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Oct 10 '24 00:10 github-actions[bot]