radondb-mysql-kubernetes
radondb-mysql-kubernetes copied to clipboard
[bug] update too many pods at the same time.
Describe the problem
There are two nodes(3 nodes cluster) to be deleted and restarted when updating configuration. its leads to a cluster to be temporarily unavailable, The correct situation should be only one node update at the same time.
To Reproduce
The default PodDisruptionBudget is 50%, so the minimum available node of the 3-node cluster is 2, but in fact, when the configuration is updated, the two nodes will update at once, which does not match the PDB.
root cause:
The StatefulSetUpdateStrategy is OnDelete, delete POD by logic as follows,
if pod.ObjectMeta.Labels["controller-revision-hash"] == s.sfs.Status.UpdateRevision {
log.Info("pod is already updated", "pod name", pod.Name)
} else {
...
if pod.DeletionTimestamp != nil {
log.Info("pod is being deleted", "pod", pod.Name, "key", s.Unwrap())
} else {
if err := s.cli.Delete(ctx, pod); err != nil {
return err
}
}
}
after delete a pod, retry will exit in advance because the health tag of the node being deleted is still yes. The correct logic is waiting for the deleted POD re-readiness and update the next POD.
if pod.ObjectMeta.Labels["healthy"] == "yes" &&
pod.ObjectMeta.Labels["controller-revision-hash"] != s.sfs.Status.UpdateRevision {
return false, fmt.Errorf("pod %s is ready, wait next schedule", pod.Name)
}
Expected behavior
Environment:
- RadonDB MySQL version:

The pod obtained in Retry () may not be the latest.
Need to check the additional DeletionTimestamp.
If the pod is deleted, health should be no and skipped other checks.
