terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard
Terraform GCP fails to destroy kubernetes_namespace: Error context deadline exceeded
Terraform Version, Provider Version and Kubernetes Version
Terraform version:
Terraform v1.2.5
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v4.28.0
+ provider registry.terraform.io/hashicorp/google-beta v4.28.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.12.1
Kubernetes provider version:
Client Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-dispatcher-dirty", GitCommit:"2b63bf75320745c39af440c6717fccf55b93c046", GitTreeState:"dirty", BuildDate:"2022-05-10T19:55:44Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8-gke.202", GitCommit:"88deae00580af268497b9656f216cb092b630563", GitTreeState:"clean", BuildDate:"2022-06-03T03:27:52Z", GoVersion:"go1.16.14b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes version: v2.12.1
GCloud Versions:
Google Cloud SDK 394.0.0
alpha 2022.07.19
beta 2022.07.19
bq 2.0.75
config-connector 1.89.0
core 2022.07.19
gsutil 5.11
kubectl 1.22.9
Affected Resource(s)
kubernetes_namespace deployed to Google GKE Cluster
Terraform Configuration Files
resource "kubernetes_namespace" "foo" { metadata { name = "foo" } }
Debug Output
See issues: #335 and Issue: #1722
Panic Output
Steps to Reproduce
terraform apply-->- 'terraform destroy1
Expected Behavior
What should have happened?
Namespace should be destroyed.
Actual Behavior
What actually happened? Namespace has status "terminating" indefinitely. Unlike Issue #1722, there is no way to change or reset on the Google side.
Per Issue #335, the kubectl api-resources will show an error:
error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Important Factoids
AFAIK, the namespace has no active resources within it.
References
- GH-335
- GH-1722
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Hi @independentid,
The observed behavior seems to be normal:
- the Kubernetes provider sent a request to destroy a namespace
- the Kubernetes cluster accepted it and is trying to delete it(
terminatingstate confirms that) - something prevents namespace deletion
I have a few questions that can help us better understand what is happening here:
- Do you have any resources in this namespace that are also stuck in the
terminatingstate? If so, what is the reason for that? - Do you see any cluster events that are related to this namespace and resources that belong to it?
Thank you.
Hi.
This is occurring during a Terraform destroy. No other resources created outside of Terraform.
Deleting the namespace via admin console works, so no other blocking cluster resource.
Have not tested via gcloud cli.
Phil
On Jul 27, 2022, at 12:41 AM, Aleksandr Rybolovlev @.***> wrote:
Hi @independentid,
The observed behavior seems to be normal:
the Kubernetes provider sent a request to destroy a namespace the Kubernetes cluster accepted it and is trying to delete it(terminating state confirms that) something prevents namespace deletion I have a few questions that can help us better understand what is happening here:
Do you have any resources in this namespace that are also stuck in the terminating state? If so, what is the reason for that? Do you see any cluster events that are related to this namespace and resources that belong to it? Thank you.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Thank you. Let me rephrase my questions and give a bit more context.
When you see that the namespace is stuck in the terminating state, do you see there DeletionTimestamp and any finalizers attached? If you can see DeletionTimestamp, then the provider job is done at this point. It is then up to the Kubernetes cluster to handle the deletion request. The namespace object as any other Kubernetes object cannot be deleted while there is at least one finalizer attached to it. I suspect, that a finalizer there is kubernetes and most probably because not all resources within this namespace were deleted. That is why I would suggest using kubectl and double-check that all resources within the namespace were deleted, and additionally, check events to see if there is anything there that can give a clue why some objects are still there.
Again, this is not something that we can do on the provider level if for whatever reason a Kubernetes object is stuck in the terminating state.
I hope that helps.
Maybe you could consider increasing the default timeout of the resource to something greater than it's default of 5 minutes? I would suggest something like excessive like 60 minutes for the sake of testing https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace#timeouts
credit to @brenthc if this works for uncovering this option during our investigation of a similar issue.
this might help to determine if there are any resources in that namespace.
kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found -l <label>=<value> -n <namespace>
I found a fix that perhaps could work for you: When tearing down the cluster, the destruction of the node pool and of the namespaces start in parallel. As the metrics server is being affected by that, the metrics api cannot be reached and the namespace cannot be deleted for reasons explained here. Adding the node pool terraform resource as a dependency of your namespace should resolve your issue
Thanks. Sounds like a good work around for now. However long term the solution makes use of terraform more tedious and less portable between K8S environments. Seems like something better handled internally when GKS detected long term. Even better if GCP fixed it. PhilOn Nov 30, 2022, at 9:44 AM, Filip Rejmus @.***> wrote: I found a fix that perhaps could work for you: When tearing down the cluster, the destruction of the node pool and of the namespaces start in parallel. As the metrics server is being affected by that, the metrics api cannot be reached and the namespace cannot be deleted for reasons explained here. Adding the node pool terraform resource as a dependency of your namespace should resolve your issue
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!