Awesome-CloudOps-Automation
Awesome-CloudOps-Automation copied to clipboard
Kubernetes Runbook: Kubelet has too many pods
Is your feature request related to a problem? Please describe. Kubelets have a configuration that limits how many Pods they can run. The default value of this is 110 Pods per Kubelet, but it is configurable. It will be great to have a runbook that detects when Kubelet has more than desired capacity of pods and mitigates the issue.
Describe the solution you'd like Here's a Prometheus runbook for this: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubelettoomanypods/
@tsari02 , assigning this to you. Can you please put a comment here?
Yeah, I would like to work on this. My approach would be to increase the capacity of pods to more than 110, am I proceeding in the right direction? Also, I am completely new to Kubernetes, could you tell some resources which can serve as prerequisite to solve this issue or is reading the repo sufficient?
Hey @tsari02 , the pod limit is enforced and we can't change that. We will have to scale up the resources here. @abhishek-unskript , can we help @tsari02 with the appropriate resources here?
@jayasimha-raghavan-unskript can help here.
Hello @tsari02!
Thank you for helping us out.
Here are some documentation links
Kubernetes Architecture Kubernetes Nodes Kubernetes Resource Limits Best practices for Deploying Kubernetes cluster Best practice for Resource requests and limit
Here a brief digest that could you help you get started.
Kubernetes has many components that gets deployed as soon as you install a Kubernetes cluster. There are many ways to install a kubernetes cluster locally, Like K3S, Minikube, Microk8s, etc.. Also Many cloud providers have their own Kubernetes solutions like AWS's EKS, Googles K8S Cluster, etc..
Lets consider a deployment scenario where the cluster has a single node, N1. Say there are about 109 PODS on N1. Node N1 is being used almost to its capacity. Consider there is a new Microservice (lets call it API metrics module) that need to be installed in this cluster. If we try to deploy this Pod (with 2 replica set) on node N1, the Kubelet will complain saying, the Maximum pod count has reached, cannot be deployed on Node N1.
This runbook as @JBAhire explained earlier, is intended to be run proactively that will detect if the Node's maximum capacity is about to reach (say at 95%), Once it detects, will inform the User to either Add a New Node to the cluster or consider increasing the POD Count via configuration if the given host has enough resource to support the new POD.
Let me know if you have any questions.