kubectl-status icon indicating copy to clipboard operation
kubectl-status copied to clipboard

Place an explicit explanation for stuck sts rollout

Open bergerx opened this issue 3 years ago • 0 comments

We have been suffering from https://github.com/kubernetes/kubernetes/pull/78709 for a while.

Once an existing statefulset is deployed with some faulty config, there seems to be no way to get it to recover without manually deleting the stuck pod. From the PR:

There are a few cases where a StatefulSet can become bricked. Usually from something like setting an invalid image tag in the container. When this occurs, manual intervention is required in order to clear out the bad StatefulSet pods and allow k8s to spawn new ones. In this PR, I took a shot at detecting when a pod is stuck and we have reasonable confidence that replacing that pod will result in a better case than performing a no-op.

Here is a quote from the PR:

// isStatefullyStuck returns true if a pod in a stateful set is stuck
// due to a previously bad roll-out. We can detect this by checking all of:
// 1) The pod is in a pending state
// 2) The pod is at a different revision than the update revision
// 3) The update strategy is rolling
// 4) The pod should be updated
func isStatefullyStuck(set *apps.StatefulSet, pod *v1.Pod, currentRevision, updateRevision *apps.ControllerRevision) bool {
	return isPending(pod) &&
		getPodRevision(pod) == currentRevision.Name &&
		currentRevision.Name != updateRevision.Name &&
		set.Spec.UpdateStrategy.Type == apps.RollingUpdateStatefulSetStrategyType &&
		set.Spec.UpdateStrategy.RollingUpdate.Partition != nil &&
		getOrdinal(pod) >= int(*set.Spec.UpdateStrategy.RollingUpdate.Partition)
}

bergerx avatar Oct 01 '21 20:10 bergerx