eks-anywhere Improve resource cleanup

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
Cleanup vsphere vms if it's a vsphere cluster
Cleanup docker resources if it's a docker cluster
Delete <cluster-name> folder

Sep 16 '21 15:09 g-gaston

@g-gaston Is this a duplicate of https://github.com/aws/eks-anywhere/issues/163?

Sep 27 '21 23:09 vivek-koppuru

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

Oct 14 '21 16:10 jasonboche

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:

kind delete cluster --name prod-eks-a-cluster

Oct 22 '21 15:10 g-gaston

kind delete cluster --name prod-eks-a-cluster

Thank you kindly. I'll give that a try!

Oct 25 '21 17:10 jasonboche

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere. eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:
kind delete cluster --name prod-eks-a-cluster

It didn't work on me :/

Mar 25 '22 10:03 ataince

@ataince What output did you get?

Mar 25 '22 13:03 chrisdoherty4

@ataince What output did you get?

I think it was abt the memory now I increased it but now it's stuck on this step.

⏳ Collecting support bundle from cluster, this can take a while {"cluster": "dev-cluster", "bundle": "dev-cluster/generated/dev-cluster-2022-03-25T12:40:36Z-bundle.yaml", "since": 1648208436523389598, "kubeconfig": "dev-cluster/dev-cluster-eks-a-cluster.kubeconfig"}

Mar 25 '22 13:03 ataince

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere. eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:
kind delete cluster --name prod-eks-a-cluster
It didn't work on me :/

@ataince I'm sorry to hear that. I wrapped up this project before I was able to get to the bottom of this. I just learned the many traps to avoid so that I didn't get stuck and have to re-deploy all over again. I'll probably end up revisiting this project within the next year and my hope is by then AWS will have put in much better error trapping and clear cleanup steps that actually work. This isn't a total knock on AWS. I realize this was relatively new and uncharted territory and these types of issues go with the territory until things mature.

Jas

Mar 28 '22 21:03 jasonboche

Bumping up the priority on this one as it's pretty important issue that has come up multiple times especially for cleaning up local bootstrap cluster and the cluster-name folder.

Jul 08 '22 21:07 abhinavmpandey08

Adding my voice. This needs to be prioritized. While figuring out how to get everything working you tear through quite a few clusters. A quick and thorough cleanup is a must.

Nov 14 '22 13:11 AndreasDavour

eks-anywhere eks-anywhere copied to clipboard

Improve resource cleanup

eks-anywhere
eks-anywhere copied to clipboard