spark-operator add configurable livenessProbe & readinessProbe to helm chart

Signed-off-by: André Bauer [email protected]

add configurable livenessProbe & readinessProbe to helm chart
fixes: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1592 ttps://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/969 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/485

Oct 21 '22 10:10 monotek

can i do anything to get this merged?

Jan 23 '23 15:01 monotek

@monotek We tried these changes manually with spark operator v1.1.26 and after making these changes we could see readinessProbe & livenessProbe in spark operator yml file. We wanted to consume readinessProbe & livenessProbe in our sparkApplication deployment.yml file We tried adding these values under spec - > Although our application is running fine but I don't see the readinessProbe and livenessProbe values in deployed application yml file , it is getting ignored or skipped

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
 spec:
    imagePullPolicy: Always
    sparkVersion: 3.4.0
    mode: cluster
    type: Scala
    ports:
   - name: http
     containerPort: 8080
     protocol: TCP
  readinessProbe:
    httpGet:
      path: /metrics
      port: 8080
      scheme: HTTP
    initialDelaySeconds: 120
    timeoutSeconds: 120
    periodSeconds: 30
    successThreshold: 1
    failureThreshold: 3
  livenessProbe:
    httpGet:
      path: /metrics
      port: 8080
      scheme: HTTP
    initialDelaySeconds: 120
    timeoutSeconds: 120
    periodSeconds: 30
    successThreshold: 1
    failureThreshold: 3

Can you help us defining the deployment.yml that can be used for deploying sparkApplication with readinessProbe & livenessProbe settings?

Jun 22 '23 04:06 NageshB82

Almost a year. Yet to be merged :/ Any other way to add Liveness probe to spark operator as well as spark driver pod?

Oct 03 '23 17:10 AaveshIN

We just stopped using it and switched to spark submit...

Oct 03 '23 19:10 monotek

We just stopped using it and switched to spark submit...

Would you mind to quickly describe how you are using spark submit? client mode / cluster mode? How are you doing retries?

Oct 03 '23 20:10 Unbekannt89

Nope, sorry, can't go into details. Was not involved in it.

Oct 04 '23 09:10 monotek

This is very inflexible configuration for liveness probe!

I'm asking because driver and executor may have different liveness probes. Metrics port may not be available on executor depending on type of metrics configuration. When running with side-car like fluent-bit the driver may not detect main executor is dead because fluent-bit side car container is running even when main executor container terminated with failure. This means fluent-bit needs liveness probe. Would this configuration handle this?

Furthermore liveness probe based on side container and main container may communicate via read and write to file which requires different probe for main container and side-car.

I don't think use cases were well thought out or considered before implementing this configuration.

Would tcpSocket livenessProbe on block manager port (default 7079) which is available on driver and executor be better than livenessProbe for metrics?

Apr 08 '24 21:04 apiwoni

spark-operator spark-operator copied to clipboard

add configurable livenessProbe & readinessProbe to helm chart

spark-operator
spark-operator copied to clipboard