spark-operator
spark-operator copied to clipboard
add configurable livenessProbe & readinessProbe to helm chart
Signed-off-by: André Bauer [email protected]
- add configurable livenessProbe & readinessProbe to helm chart
- fixes: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1592 ttps://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/969 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/485
can i do anything to get this merged?
@monotek We tried these changes manually with spark operator v1.1.26 and after making these changes we could see readinessProbe & livenessProbe in spark operator yml file. We wanted to consume readinessProbe & livenessProbe in our sparkApplication deployment.yml file We tried adding these values under spec - > Although our application is running fine but I don't see the readinessProbe and livenessProbe values in deployed application yml file , it is getting ignored or skipped
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
spec:
imagePullPolicy: Always
sparkVersion: 3.4.0
mode: cluster
type: Scala
ports:
- name: http
containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /metrics
port: 8080
scheme: HTTP
initialDelaySeconds: 120
timeoutSeconds: 120
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /metrics
port: 8080
scheme: HTTP
initialDelaySeconds: 120
timeoutSeconds: 120
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
Can you help us defining the deployment.yml that can be used for deploying sparkApplication with readinessProbe & livenessProbe settings?
Almost a year. Yet to be merged :/ Any other way to add Liveness probe to spark operator as well as spark driver pod?
We just stopped using it and switched to spark submit...
We just stopped using it and switched to spark submit...
Would you mind to quickly describe how you are using spark submit? client mode / cluster mode? How are you doing retries?
Nope, sorry, can't go into details. Was not involved in it.
This is very inflexible configuration for liveness probe!
I'm asking because driver and executor may have different liveness probes. Metrics port may not be available on executor depending on type of metrics configuration. When running with side-car like fluent-bit the driver may not detect main executor is dead because fluent-bit side car container is running even when main executor container terminated with failure. This means fluent-bit needs liveness probe. Would this configuration handle this?
Furthermore liveness probe based on side container and main container may communicate via read and write to file which requires different probe for main container and side-car.
I don't think use cases were well thought out or considered before implementing this configuration.
Would tcpSocket livenessProbe on block manager port (default 7079) which is available on driver and executor be better than livenessProbe for metrics?