website
website copied to clipboard
Document Pod failures with Out Of Resources error
This is a Feature Request
What would you like to be added
Symptoms
A pod is failing with OutOfCpu / OutOfMemory / OutOfPods error. kubectl describe output for such pods:
Name: pod-566c6c585-jsjhn
Namespace: namespace
...
Status: Failed
Reason: OutOfcpu
Message: Pod Node didn't have enough resource: cpu, requested: 120, used: 3803, capacity: 3920
Status: Failed
Reason: OutOfMemory
Message: Pod Node didn't have enough resource: memory, requested: 16000000000, used: 31946743808, capacity: 37634150400
Status: Failed
Reason: OutOfPods
Message: Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32
Root cause
The kube-scheduler is responsible for scheduling and assigning pods to nodes, based on the available resources on each node. However, it is important to note that the kubelet may also have static pods (like kube-proxy) the presence and resource requests of which are not known to the kube-scheduler until they are reported by the kubelet.
In rare cases, it is possible for the following sequence of events to occur during a node bootstrap process:
- The
kubeletstarts and registers its node on thekube-apiserver. - The
kubeletreports the available node resources. - The
kubeletdiscovers and runs static pods from the/etc/kubernetes/manifests/directory. - The
kube-schedulerassigns pods to the node, the total resource consumption is close to the node's available resources (kube-schedulerdoesn't know about the static pod(s) yet). - The
kubeletaccepts and run pods that were scheduled bykube-scheduler. - The
kubeletreports the updated available node resources, including the static pod(s) resource requests. - If the total amount of resource requests for static pods and pods scheduled by
kube-schedulerexceed the available resources of the node or if the total number of pods exceeds the maximum number of pods, one or more pods may fail with anOutOfCpu,OutOfMemory, orOutOfPodserror. daemonset-controller/replicaset-controllershould notice that the number of healthy pods are not as expected and it will create new replicas.kube-schedulerwill assign these new pods to the correct node and they should be running ultimately.- Pods with
OutOfCpu,OutOfMemory, orOutOfPodserror are cleaned up after sometime by pod garbage collector controller inkube-controller-managerbased on how it is configured.
Why is this needed
This is a known issue kubernetes/kubernetes#115325 and generally happens on a node with static pods. kube-proxy is commonly run as a static pod too. Ultimately, this is not a harmful issue and only causes a delay in scheduling of pods correct.
We have ideas to fix this. But in the meantime, it would be a good idea to document this issue to educate the users and not spook them. We have had 38 bugs filed for such issues and there's little to no documentation about this failures.
Comments
/cc @tallclair
/sig node
/triage accepted
Help is wanted, although there's not enough detail in the issue description to justify adding /help to make it a bigger callout.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted