AKS
AKS copied to clipboard
Evicted Pods
What happened: Pods that are evicted due to node availability or memory issues keep the evicted state visible in the portal and AKS. The garbage collector settings aren't available in AKS so we are unable to change the default value of 12500 until cleanup. If it hits a certain amount all nodes become un-responsive and slow.
What you expected to happen: The ability to change the garbage collector threshold and not have to create something custom that Kubernetes does out the box
How to reproduce it (as minimally and precisely as possible): run spot nodes on aks let the nodes spot and see the evicted pods grow and eventually the cluster becomes slow
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Size of cluster (how many worker nodes are in the cluster?)
- General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
- Others:
Hi Danlewis3, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such:
- If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
- Please abide by the AKS repo Guidelines and Code of Conduct.
- If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
- Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
- Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
- If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
We also have evicted pods for no apparrent reason. While the evicted pods are a nuisance, the main problem is that the application is not available for a short time...
@Danlewis3 you could look at using the Kubernetes Descheduler tool https://github.com/kubernetes-sigs/descheduler#removefailedpods to remove the evicted pods.
A sample config policy would include a section like:
deschedulerPolicy:
strategies:
RemoveFailedPods:
enabled: true
params:
failedPods:
includingInitContainers: true
minPodLifeTimeSeconds: 120
namespaces:
include:
- "namespace1"
- "namespace2"
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads