skaffold icon indicating copy to clipboard operation
skaffold copied to clipboard

Support tolerateFailuresUntilDeadline for Helm deployments

Open esabdull opened this issue 7 months ago • 0 comments

Could you consider adding support for the tolerateFailuresUntilDeadline field for Helm deployments in Skaffold?

Context In GKE Autopilot clusters, Helm deployments sometimes fail in Skaffold due to delays caused by node autoscaling. For example, if a node is deleted during a deployment, the associated pod needs to be recreated on a new node. This process can take some time.

Even though Kubernetes eventually recreates the pod and the deployment completes successfully, Skaffold may already report the deployment as failed.

Why this is needed Currently, there’s no mechanism for Helm deployments in Skaffold to tolerate temporary scheduling issues. Supporting tolerateFailuresUntilDeadline for Helm—similar to what was introduced for Cloud Run in v2.16.0 — would allow Skaffold to wait before marking the deployment as failed.

This would make deployments in Autopilot environments more resilient and improve reliability in GitHub Actions and other CI/CD setups.

esabdull avatar May 05 '25 12:05 esabdull