seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

seldon deployment doesn't wait for container to be ready

Open cheskayang opened this issue 2 years ago • 0 comments

i have a seldon deployment which includes a couple of containers in the same pod, configuration for one container looks similar to below, the container is of type COMBINER

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
spec:
  name: my-app
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: my-app-image:version
          name: combiner
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            timeoutSeconds: 1

and the controller manager has executor.fullHealthChecks set to true

Describe the bug

The above combiner has some errors and failed to start, stuck in crashloopback, however, the seldon deployment is healthy despite the container crashing (and MaxUnavailable is 0 on the deployment)

even when specifically implemented the health_status method to throw errors, the deployment still succeed despite of the crashing container

    def health_status(self):
        raise Exception("Health check failed due to some condition.")

Expected behaviour

during a new deployment, a failing container should fail the deployment

Environment

kubectl server version 1.26.7 seldon controller version 1.17.0

cheskayang avatar Oct 11 '23 20:10 cheskayang