ingress-gce
ingress-gce copied to clipboard
I think NEG finalizers are making my namespaces take 10+ mins to delete
It always takes a very long time to delete namespaces.
First I delete all resources in a namespace and confirm its empty:
kubectl get all -n derps
No resources found in derps namespace.
Then I try to delete the namespace:
kubectl delete namespace derps
This hangs for 10+ minutes, but eventually removes the namespace
While the namespace is stuck in the terminating phase I see this:
apiVersion: v1
kind: Namespace
metadata:
annotations:
cnrm.cloud.google.com/project-id: derps
creationTimestamp: "2022-05-24T19:43:34Z"
deletionTimestamp: "2022-05-25T14:01:00Z"
labels:
kubernetes.io/metadata.name: derps
name: derps
resourceVersion: "235757709"
uid: bd05b4d8-31da-4e76-9178-6dff37d52314
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2022-05-25T14:01:07Z"
message: All resources successfully discovered
reason: ResourcesDiscovered
status: "False"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2022-05-25T14:01:07Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2022-05-25T14:01:07Z"
message: All content successfully deleted, may be waiting on finalization
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
- lastTransitionTime: "2022-05-25T14:01:07Z"
message: 'Some resources are remaining: ingresses.extensions has 1 resource instances,
ingresses.networking.k8s.io has 1 resource instances, servicenetworkendpointgroups.networking.gke.io
has 1 resource instances'
reason: SomeResourcesRemain
status: "True"
type: NamespaceContentRemaining
- lastTransitionTime: "2022-05-25T14:01:07Z"
message: 'Some content in the namespace has finalizers remaining: networking.gke.io/ingress-finalizer-V2
in 2 resource instances, networking.gke.io/neg-finalizer in 1 resource instances'
reason: SomeFinalizersRemain
status: "True"
type: NamespaceFinalizersRemaining
phase: Terminating
It looks like the NEG finalizers are causing this. Is this normal? I see this on ALL deployments where I'm using GKE ingresses. Want to know if this is expected behavior because its quite clunky.
/kind support
Just to follow up GCP support told me this is expected behavior and I can't do anything to speed it up.
Yes, this is expected behavior. The NEG finalizers exist to make sure that the ServiceNetworkEndpointGroup CRs ( a status only API) continue to stay around until the respective NEGs in GCE have been deleted. The NEG GC loop occurs every 2 minutes. NEG resources cannot be deleted if there are GCE resources that are using them such as BackendServices etc.
Most likely what is happening is that the Ingress resources are being cleaned up, so NEG deletion fails until all the ingress related resources are deleted first and then the NEG resources can be deleted.
The command kubectl get all -n derps
does not show all the resources in the namespaces and only selectively shows certain core K8s resources. In this case Ingress and ServiceNetworkEndpointGroups (a CRD) are not shown. For a more accurate picture of what exists in the namespace try kubectl api-resources
as described in https://github.com/kubernetes/kubectl/issues/151#issuecomment-402003022.
kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found -l <label>=<value> -n <namespace>
Namespace deletion will be faster if all Ingresses and Negs are deleted before the namespace. If the resources are deleted, the namespace deletion should not hang, however deleting the namespace is easier to systematically clean up all resources in a namespace.
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Just to follow up I have not experienced this behavior FYI: "Namespace deletion will be faster if all Ingresses and Negs are deleted before the namespace."
I deploy with helm and helm uninstall before removing the namespace. even removing all resources first does not seem to speed up the namespace deletion. maybe you need to wait a min or two after deleting the ingress to delete the namespace?
@red8888, are you deleting the namespace right after all the NEG and Ingress resources are deleted or waiting for those resources to be completely deleted? A quick check is to run the following command to ensure that no SvcNeg CRs or Ingresses remain in the namespace before deleting the namespace.
kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found -n <namespace>
As mentioned in the above comment, the problem is that GC takes time. When you delete the namespace, due to finalizers the resources will block namespace deletion. Those finalizers are required though to ensure that the controllers are able to clean up the created GCE resources.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This is still an issue, and has resources lingering for much longer than the "2 minute loop" mentioned above
This is still an issue, and has resources lingering for much longer than the "2 minute loop" mentioned above
The same issue for me: when I try to delete k8s namespace, it takes more than 8 minutes to be able to delete it due to:
Some resources are remaining: servicenetworkendpointgroups.networking.gke.io has 1 resource instances
But all resources managed by Helm have been deleted successfully using the command:
helm uninstall
Please, would it be possible to make this deletion faster, e.g. mark some k8s (GKE) resources as deleted or ?
Is this closed as expected behaviour? I have been waiting 20 minutes for a namespace to delete (which is quite a long time for an accidental delete :see_no_evil: )