eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

[Bug]: Cluster removal fails when addons are installed

Open mbevc1 opened this issue 2 years ago • 19 comments

What were you trying to accomplish?

Deleting EKS cluster provisioned with addons.

What happened?

When deleting the cluster first step is cordoning the nodes. EBS addon is running controller replica and is blocking cordoning of the nodes. We should probably be removing the addons first to ensure only workloads are running before cordoning.

How to reproduce it?

Provision EKS cluster with a addon (EBS). Then delete is using eksctl delete cluster -f config.yaml

Logs

2023-02-13 14:36:56 [ℹ]  deleting EKS cluster "mb"
2023-02-13 14:36:57 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "mb"
2023-02-13 14:36:57 [ℹ]  starting parallel draining, max in-flight of 1
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-29-32.eu-west-1.compute.internal"
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-46-245.eu-west-1.compute.internal"
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-92-221.eu-west-1.compute.internal"
2023-02-13 14:38:12 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:39:14 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:40:16 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:41:18 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:42:20 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal

Anything else we need to know?

Versions

$ eksctl info

eksctl version: 0.129.0 kubectl version: v1.26.1 OS: linux

mbevc1 avatar Feb 13 '23 14:02 mbevc1

Hi @mbevc1 ! As per our documentation on how to delete clusters here, Pod Disruption Budget policies are preventing the EBS addon from being properly removed. You should run your command with --disable-nodegroup-eviction flag. i.e.

eksctl delete cluster -f cluster.yaml --disable-nodegroup-eviction

TiberiuGC avatar Feb 14 '23 06:02 TiberiuGC

Thanks @TiberiuGC for pointing ou this doc note. I guess removing the addon first or in parallel could prevent this issue?

mbevc1 avatar Feb 14 '23 09:02 mbevc1

I think you're right, removing the addon first would probably work as-well.

TiberiuGC avatar Feb 15 '23 09:02 TiberiuGC

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Mar 18 '23 01:03 github-actions[bot]

Any more decisions on this?

mbevc1 avatar Mar 18 '23 19:03 mbevc1

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 19 '23 01:04 github-actions[bot]

We should keep this alive ;)

mbevc1 avatar Apr 19 '23 09:04 mbevc1

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 21 '23 01:05 github-actions[bot]

keep-alive :)

mbevc1 avatar May 21 '23 11:05 mbevc1

I think you're right, removing the addon first would probably work as-well.

Just running into this issue as we speak, so I'd suggest adding a note about removing the add-on on this page: https://eksctl.io/usage/creating-and-managing-clusters/

dabcoder avatar Jun 13 '23 13:06 dabcoder

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Oct 28 '23 01:10 github-actions[bot]

keep-alive again :)

mbevc1 avatar Oct 28 '23 07:10 mbevc1

@mbevc1 have you tried to run with --disable-nodegroup-eviction?

AmitBenAmi avatar Nov 28 '23 05:11 AmitBenAmi

I haven't tried that one, but looks like it could work. But I reckon we still want to have a graceful and build-in support to deprovision add-ons

mbevc1 avatar Nov 28 '23 09:11 mbevc1

Hi @mbevc1 ! As per our documentation on how to delete clusters here, Pod Disruption Budget policies are preventing the EBS addon from being properly removed. You should run your command with --disable-nodegroup-eviction flag. i.e.

eksctl delete cluster -f cluster.yaml --disable-nodegroup-eviction

this did not work for me. im still having the issue

2023-12-17 04:59:01 [ℹ] deleting EKS cluster "jx-angular-clusterx" 2023-12-17 04:59:04 [ℹ] will drain 2 unmanaged nodegroup(s) in cluster "jx-angular-clusterx" 2023-12-17 04:59:04 [ℹ] starting parallel draining, max in-flight of 1 2023-12-17 04:59:08 [✔] drained all nodes: [ip-172-31-13-6.ec2.internal ip-172-31-38-140.ec2.internal] 2023-12-17 05:00:08 [!] 1 pods are unevictable from node ip-172-31-47-9.ec2.internal 2023-12-17 05:01:09 [!] 1 pods are unevictable from node ip-172-31-47-9.ec2.internal

EmpeRoar avatar Dec 16 '23 21:12 EmpeRoar

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jan 16 '24 01:01 github-actions[bot]

keep-alive

mbevc1 avatar Jan 16 '24 09:01 mbevc1

It's probably worth calling out that some addons get added by default for new clusters. So, I don't think there are actually any cases now where the delete will actually work. At least --disable-nodegroup-eviction unblocks though! :)

m0un10 avatar Jan 21 '24 10:01 m0un10

It's probably worth calling out that some addons get added by default for new clusters.

Indeed, the coredns addon now has a default Pod Disruption Budget (PDB), preventing the pod to be evicted when draining the nodegroups. As a workaround for now, we can use eksctl delete cluster --disable-nodegroup-eviction or eksctl delete nodegroup --disable-eviction.

yuxiang-zhang avatar Jan 22 '24 22:01 yuxiang-zhang