AKS icon indicating copy to clipboard operation
AKS copied to clipboard

Evicted Pods

Open Danlewis3 opened this issue 3 years ago • 41 comments

What happened: Pods that are evicted due to node availability or memory issues keep the evicted state visible in the portal and AKS. The garbage collector settings aren't available in AKS so we are unable to change the default value of 12500 until cleanup. If it hits a certain amount all nodes become un-responsive and slow.

What you expected to happen: The ability to change the garbage collector threshold and not have to create something custom that Kubernetes does out the box

How to reproduce it (as minimally and precisely as possible): run spot nodes on aks let the nodes spot and see the evicted pods grow and eventually the cluster becomes slow

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Size of cluster (how many worker nodes are in the cluster?)
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
  • Others:

Danlewis3 avatar Nov 04 '21 20:11 Danlewis3

Hi Danlewis3, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost avatar Nov 04 '21 20:11 ghost

Triage required from @Azure/aks-pm

ghost avatar Nov 07 '21 00:11 ghost

We also have evicted pods for no apparrent reason. While the evicted pods are a nuisance, the main problem is that the application is not available for a short time...

pcornelissen avatar Apr 26 '22 06:04 pcornelissen

@Danlewis3 you could look at using the Kubernetes Descheduler tool https://github.com/kubernetes-sigs/descheduler#removefailedpods to remove the evicted pods.

A sample config policy would include a section like:

deschedulerPolicy:
  strategies:
    RemoveFailedPods:
     enabled: true
    params:
      failedPods:
        includingInitContainers: true
        minPodLifeTimeSeconds: 120
        namespaces:
          include:
            - "namespace1"
            - "namespace2"

larryclaman avatar Apr 26 '22 12:04 larryclaman

Action required from @Azure/aks-pm

ghost avatar Oct 28 '22 19:10 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Nov 13 '22 00:11 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Nov 28 '22 06:11 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Dec 13 '22 12:12 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Dec 28 '22 18:12 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jan 13 '23 00:01 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jan 28 '23 06:01 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Feb 12 '23 12:02 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Feb 27 '23 18:02 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Mar 15 '23 00:03 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Mar 30 '23 00:03 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Apr 14 '23 06:04 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Apr 29 '23 12:04 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar May 14 '23 18:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar May 30 '23 00:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 14 '23 06:06 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 29 '23 12:06 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 14 '23 18:07 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 30 '23 00:07 ghost

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads