cockroach-operator
cockroach-operator copied to clipboard
Cluster stateful set stuck when failed scheduling occurs
If there exists a cluster and the resource requests are scaled up resulting in failed scheduling, that cluster node will be stuck in a failed state, even in resource requests are reduced.
The flow looks like:
- Scale up cluster nodes
- Last node in stateful set will be updated
- Node fails to schedule
- Reduce the resource requests for the
CrdbCluster - Stateful set is not updated to reflect new resource requests
The workaround is that the cluster admin or whoever with access must manually edit the stateful set to assign the updated requests.
Questions Last node in stateful set will be updated - how? Does the operator update it? By node do you mean pod? Node fails to schedule - why? Can you do a describe Reduce the resource requests for the CrdbCluster - are you editing the crd? Stateful set is not updated to reflect new resource requests - this I am guessing is a separate problem
To recap. You add a new k8s node and somehow the last pod of the sts is updated and then does not schedule.
If you edit the resource requests such as CPU or memory the sts is not updated.
Questions Last node in stateful set will be updated - how? Does the operator update it? By node do you mean pod?
Yes - the Pod is also the CRDB node in the CRDB cluster.
Node fails to schedule - why? Can you do a describe
It failed to schedule because of not enough CPU and Memory. e.g.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 63s (x2 over 63s) default-scheduler 0/72 nodes are available: 44 Insufficient memory, 72 Insufficient cpu.
Normal NotTriggerScaleUp 61s cluster-autoscaler pod didn't trigger scale-up: 1 Insufficient cpu, 1 Insufficient memory, 4 max node group size reached
Reduce the resource requests for the CrdbCluster - are you editing the crd?
Yes
Stateful set is not updated to reflect new resource requests - this I am guessing is a separate problem
What else would update the stateful set if not the operator? It literally has the label: app.kubernetes.io/managed-by=cockroach-operator
Sorry, I glossed over some things. As I understand it all....
The cockroach-operator acts as a controller for the CrdbCluster CRD.
There is a stateful set controller for StatefulSet resources that ultimately controls Pod spec changes.
When the stateful set controller makes changes, it makes a change to the last pod in a stateful set and does not move on to others until the last pod is alive.
When updating the CrdbCluster, the controller is updating the underlying StatefulSet. If that results in failed scheduling due to resources, the admin may adjust the resources on the CrdbCluster. The cockroach-operator is not picking up that second change to the CrdbCluster and applying it to the underlying StatefulSet. This may be a logic loop that is waiting for the prior change to complete successfully.
I gave it several cycles in the backoff retry to see if the operator would update the StatefulSet - it did not.
Thanks for all the details!!!!
So the problem was that the pod did not schedule due to lack of resources on the node. When you updated the sts resource settings the sts did not update, or the pod was intern not updated.
We will triage and see if we can recreate and fix this.
@davidwding you mind taking a look?
Thanks for looking into it. To be clear, I did the following:
- Updated the resources in the
CrdbClusterCRD- This caused the sts to update and pods to fail scheduling
- Updated the resources in the
CrdbClusterCRD again- This did not update the sts, pods continued to fail scheduling
- Updated the resources in the sts directly
- pods successfully scheduled