spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Weird behavior: Volumes may not attached to driver Pods after a while

Open zzvara opened this issue 5 years ago • 1 comments

Environment:

CoreOS latest stable (2345.3.0), Kubernetes 1.17.0 installed with Kubespray. Admission plugins: --enable-admission-plugins=NodeRestriction,MutatingAdmissionWebhook

Operator installed:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: spark-operator
  namespace: experimental
spec:
  chart:
    repository: http://storage.googleapis.com/kubernetes-charts-incubator
    name: sparkoperator
    version: 0.6.9
  releaseName: spark-operator
  values:
    enableWebhook: true
    logLevel: 4

The following job has a volume attached to it:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-svd-5
  namespace: experimental
spec:
  type: Scala
  mode: cluster
  nodeSelector:
    spark.sztaki.hu: allowed
  image: "zzvara/spark:3.0.0"
  imagePullPolicy: Always
  mainClass: redacted
  mainApplicationFile: redacted
  sparkVersion: "3.0.0"
  hadoopConfigMap: spark-default-configuration
  sparkConf:
    "spark.default.parallelism": "20"
    "spark.executor.extraJavaOptions": "-XX:ParallelGCThreads=10 -XX:ConcGCThreads=10 -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=60"
    "spark.kubernetes.executor.deleteOnTermination": "true"
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
    "spark.kryoserializer.buffer.mb": "1024"
    "spark.driver.maxResultSize": "4g"
    # Required since Spark Operator does not support Spark 3.0.0 as of yet.
    "spark.kubernetes.executor.podTemplateFile": "/opt/spark/configuration/executor-template.yaml"
  restartPolicy:
    type: Never
  volumes:
    - name: executor-template
      configMap:
        name: spark-executor-template
  driver:
    cores: 4
    memory: "10288m"
    coreLimit: "6000m"
    labels:
      version: 3.0.0
    serviceAccount: spark-operator-sparkoperator
    volumeMounts:
      - name: executor-template
        mountPath: /opt/spark/configuration/executor-template.yaml
        subPath: executor-template.yaml
  executor:
    cores: 10
    coreLimit: "12000m"
    instances: 4
    memory: "50g"
    labels:
      version: 3.0.0

Examining the driver Pod Spec after job start (submitting the above YAML), the operator sometimes attaches the volume executor-template to the driver Pod, sometimes not. The behavior has some patterns: When the Spark Operator is restarted (its Pod deleted), the Spark Operator will attach the volume to the Pod for about 5-10 times. And then, after consecutive restarts of the SparkApplication (kubectl delete sparkapp && kubectl apply -f ...) will not attach the volume. There are no error logs in the Spark Operator, just that the Spark Driver is missing the executor-template.yaml.

zzvara avatar May 14 '20 06:05 zzvara

I tried what you have here and the SparkApplication fails to submit because the file isn't present on the spark-operator pod. Did you have to also mount the template file on the spark-operator pod?

bscaleb avatar Mar 29 '24 21:03 bscaleb