spark-operator
spark-operator copied to clipboard
Weird behavior: Volumes may not attached to driver Pods after a while
Environment:
CoreOS latest stable (2345.3.0),
Kubernetes 1.17.0 installed with Kubespray.
Admission plugins: --enable-admission-plugins=NodeRestriction,MutatingAdmissionWebhook
Operator installed:
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: spark-operator
namespace: experimental
spec:
chart:
repository: http://storage.googleapis.com/kubernetes-charts-incubator
name: sparkoperator
version: 0.6.9
releaseName: spark-operator
values:
enableWebhook: true
logLevel: 4
The following job has a volume attached to it:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-svd-5
namespace: experimental
spec:
type: Scala
mode: cluster
nodeSelector:
spark.sztaki.hu: allowed
image: "zzvara/spark:3.0.0"
imagePullPolicy: Always
mainClass: redacted
mainApplicationFile: redacted
sparkVersion: "3.0.0"
hadoopConfigMap: spark-default-configuration
sparkConf:
"spark.default.parallelism": "20"
"spark.executor.extraJavaOptions": "-XX:ParallelGCThreads=10 -XX:ConcGCThreads=10 -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=60"
"spark.kubernetes.executor.deleteOnTermination": "true"
"spark.serializer": "org.apache.spark.serializer.KryoSerializer"
"spark.kryoserializer.buffer.mb": "1024"
"spark.driver.maxResultSize": "4g"
# Required since Spark Operator does not support Spark 3.0.0 as of yet.
"spark.kubernetes.executor.podTemplateFile": "/opt/spark/configuration/executor-template.yaml"
restartPolicy:
type: Never
volumes:
- name: executor-template
configMap:
name: spark-executor-template
driver:
cores: 4
memory: "10288m"
coreLimit: "6000m"
labels:
version: 3.0.0
serviceAccount: spark-operator-sparkoperator
volumeMounts:
- name: executor-template
mountPath: /opt/spark/configuration/executor-template.yaml
subPath: executor-template.yaml
executor:
cores: 10
coreLimit: "12000m"
instances: 4
memory: "50g"
labels:
version: 3.0.0
Examining the driver Pod Spec after job start (submitting the above YAML), the operator sometimes attaches the volume executor-template to the driver Pod, sometimes not. The behavior has some patterns: When the Spark Operator is restarted (its Pod deleted), the Spark Operator will attach the volume to the Pod for about 5-10 times. And then, after consecutive restarts of the SparkApplication (kubectl delete sparkapp && kubectl apply -f ...) will not attach the volume. There are no error logs in the Spark Operator, just that the Spark Driver is missing the executor-template.yaml.
I tried what you have here and the SparkApplication fails to submit because the file isn't present on the spark-operator pod. Did you have to also mount the template file on the spark-operator pod?