Make it rollback when ImagePullBackOff error occurred
What would you like to be added:
When applying k8s resource to the cluster, sometimes ImagePullBackOff occurs but is not rolling back for now. It would be nice to detect the error and do the rollback.
Why is this needed:
At first I'll try reproducing the error.
I successfully reproduced the error.
When I tried deploying a wrong container image: xxxxxnginx:1.17 by PipeCD,
kubectl get pods said the STATUS of the pod was ImagePullBackOff,
although the PipeCD saied the deployment was SUCCESS.
The cause:
- PipeCD uses
kubectl apply/replace/createto apply k8s manifests.- https://github.com/pipe-cd/pipecd/blob/ed8b54358a5a3cfd6c9d537048795b3b2371eb6f/pkg/app/piped/executor/kubernetes/kubernetes.go#L244-L258
- apply: (replace, create as well) https://github.com/pipe-cd/pipecd/blob/ed8b54358a5a3cfd6c9d537048795b3b2371eb6f/pkg/app/piped/platformprovider/kubernetes/kubectl.go#L72
- However,
kubectl applydoes not return an error even if the deployment fails.
One solution would be like this: (If you have others, please suggest them!!)
-
Check the status of the deployment after
kubectl applyby:kubectl rollout status deployment/MY_DEPLOYMENT_NAME -
Judge by the return (by the head
errorstring?)Success:
deployment "MY_DEPLOYMENT_NAME" successfully rolled out
Failure:
error: deployment "MY_DEPLOYMENT_NAME" exceeded its progress deadline
-
Rollback if needed
cf.
- Deployment https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment
- kubectl rollout status https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_status/
One solution would be like this:
OMG, that's not enough...
We need to distinguish ImagePullBackOff from provisioning.
While deploying, kubectl rollout status would say Waiting for deployment "MY_DEPLOYMENT_NAME" rollout to finish: 0 of 2 updated replicas are available...,
even if ImagePullBackOff occured.
Or, just wait for timeout of applying. (timeout is not used now??)
kubectl describe deployment does not provide information about the pods' ImagePullBackoff. The only information we can use is the condition Progressing.
I think we can take 2 strategies. One is to apply timeouts, as @t-kikuc said. The other one is to retrieve the pods' condition directly. The live-state view has information about pods controlled by the deployments, so we can already retrieve pods' conditions.