[ENH] - Remove lingering resources after destroy

Open viniciusdc opened this issue 1 year ago • 0 comments

We've seen instances where some cloud resources, such as data disks from volume mounts, are kept around after destruction. One such resource is PVs, due to them not being wholly managed by Terraform (e.g., Kubernetes might create it for a specific service/pod, but those are not explicit resources in the source code itself of nebari) thus when destroy is called, they don't show in the clean up graph, hence becoming leftovers.

On Azure, we guaranteed that all resources would be cleaned up after removal due to the hierarchy of the resources bonded to the main account. In contrast, this structure does not follow the same category as the other providers.

Due to how AWS and GCP handle user data from disk mounts, for example, the data remains even after the clusters are moved unless the project is entirely removed. To address this workaround, we've made some bash scripts and Python executables that talk to the primary providers' SDKs to forcefully handle the destruction. While this handles the problem, it was never a fix, and to follow best practices, we should've been using Terraform to create and destroy it as a whole.

It seems like a good opportunity for us to revise this logic and make complete removal during the destruction the standard response for all providers.

Right now, the main idea would be to make sure all resources are indeed removed after destruction, and if the user wants to keep them, we could have an extra flag in the destroy command to keep them.

Next steps?

List all orphan resources not handled by terraform right now
Explore if the corresponding terraform provider for the target cloud gives the option to destroy lingering resources on removal.

Jun 24 '24 18:06 viniciusdc