pipecd icon indicating copy to clipboard operation
pipecd copied to clipboard

Make it rollback when ImagePullBackOff error occurred

Open knanao opened this issue 2 years ago • 6 comments

What would you like to be added:

When applying k8s resource to the cluster, sometimes ImagePullBackOff occurs but is not rolling back for now. It would be nice to detect the error and do the rollback.

Why is this needed:

knanao avatar Jan 24 '23 08:01 knanao

At first I'll try reproducing the error.

t-kikuc avatar Apr 05 '24 06:04 t-kikuc

I successfully reproduced the error. When I tried deploying a wrong container image: xxxxxnginx:1.17 by PipeCD, kubectl get pods said the STATUS of the pod was ImagePullBackOff, although the PipeCD saied the deployment was SUCCESS.

t-kikuc avatar Apr 08 '24 07:04 t-kikuc

The cause:

  • PipeCD uses kubectl apply/replace/create to apply k8s manifests.
    • https://github.com/pipe-cd/pipecd/blob/ed8b54358a5a3cfd6c9d537048795b3b2371eb6f/pkg/app/piped/executor/kubernetes/kubernetes.go#L244-L258
    • apply: (replace, create as well) https://github.com/pipe-cd/pipecd/blob/ed8b54358a5a3cfd6c9d537048795b3b2371eb6f/pkg/app/piped/platformprovider/kubernetes/kubectl.go#L72
  • However, kubectl apply does not return an error even if the deployment fails.

t-kikuc avatar Apr 08 '24 08:04 t-kikuc

One solution would be like this: (If you have others, please suggest them!!)

  1. Check the status of the deployment after kubectl apply by:

    kubectl rollout status deployment/MY_DEPLOYMENT_NAME
    
  2. Judge by the return (by the head error string?)

    Success:

    deployment "MY_DEPLOYMENT_NAME" successfully rolled out

    Failure:

    error: deployment "MY_DEPLOYMENT_NAME" exceeded its progress deadline

  3. Rollback if needed

cf.

  • Deployment https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment
  • kubectl rollout status https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_status/

t-kikuc avatar Apr 08 '24 08:04 t-kikuc

One solution would be like this:

OMG, that's not enough...

We need to distinguish ImagePullBackOff from provisioning.

While deploying, kubectl rollout status would say Waiting for deployment "MY_DEPLOYMENT_NAME" rollout to finish: 0 of 2 updated replicas are available..., even if ImagePullBackOff occured.

Or, just wait for timeout of applying. (timeout is not used now??)

t-kikuc avatar Apr 08 '24 09:04 t-kikuc

kubectl describe deployment does not provide information about the pods' ImagePullBackoff. The only information we can use is the condition Progressing.

I think we can take 2 strategies. One is to apply timeouts, as @t-kikuc said. The other one is to retrieve the pods' condition directly. The live-state view has information about pods controlled by the deployments, so we can already retrieve pods' conditions.

Warashi avatar Jun 13 '24 08:06 Warashi