datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Finalizers block deletion of namespace and agent

Open fifoome opened this issue 2 years ago • 3 comments

Describe what happened: When deleting the namespace, a race condition between the agent finalizers and the namespace termination happened blocking the deletion of both.

namespace description:

k describe ns ...      
Name:         ...
Labels:       kubernetes.io/metadata.name=...
Annotations:  <none>
Status:       Terminating
Conditions:
  Type                                         Status  LastTransitionTime               Reason                Message
  ----                                         ------  ------------------               ------                -------
  NamespaceDeletionDiscoveryFailure            False   Wed, 30 Nov 2022 12:00:36 -0500  ResourcesDiscovered   All resources successfully discovered
  NamespaceDeletionGroupVersionParsingFailure  False   Wed, 30 Nov 2022 12:00:36 -0500  ParsedGroupVersions   All legacy kube types successfully parsed
  NamespaceDeletionContentFailure              False   Wed, 30 Nov 2022 12:01:22 -0500  ContentDeleted        All content successfully deleted, may be waiting on finalization
  NamespaceContentRemaining                    True    Wed, 30 Nov 2022 12:00:36 -0500  SomeResourcesRemain   Some resources are remaining: datadogagents.datadoghq.com has 1 resource instances
  NamespaceFinalizersRemaining                 True    Wed, 30 Nov 2022 12:00:36 -0500  SomeFinalizersRemain  Some content in the namespace has finalizers remaining: finalizer.agent.datadoghq.com in 1 resource instances

When doing a describe on the agent:

...
k describe DatadogAgent datadog
(...)
Conditions:
   (...)
    Last Transition Time:  2022-11-30T17:00:34Z
    Message:               services "datadog-cluster-agent" is forbidden: unable to create new content in namespace ... because it is being terminated
    Status:                True
    Type:                  ReconcileError
Events:                    <none>

tentative to delete the agent by patching the finalizer

k patch datadogagents.datadoghq.com datadog  -p '{"metadata":{"finalizers":null}}' --type=merge 
The DatadogAgent "datadog" is invalid: status.conditions.reason: Required value

I was able to finally delete it by editing the agent crd directly and removing the mandatory state on the status.conditions.reason field from it

fifoome avatar Dec 01 '22 14:12 fifoome

@fifoome , I could not find the status.conditions.reason field, in k describe. could you share how you deleted?

the error i am getting

k patch DatadogAgent datadog -n ddoperator -p '{"metadata":{"finalizers":null}}' --type=merge
The DatadogAgent "datadog" is invalid:
* status.conditions[0].reason: Required value
* status.conditions[1].reason: Required value
* status.conditions[2].message: Required value
* status.conditions[2].reason: Required value

ssrahul96 avatar Feb 07 '23 14:02 ssrahul96

@ssrahul96 I accomplished the same thing by deleting the operator, and then editing the CRD and removing the message and reason (line 8426 and 8427 for me) attributes from the required array.

kubectl delete deployment datadog-operator
kubectl edit crd datadogagents.datadoghq.com
kubectl patch datadogagent/datadog --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

jaredtbates avatar Feb 10 '23 04:02 jaredtbates

one more solution from datadog support

Deploy DatadogAgent again with empty finalizers

apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog
  finalizers: []
spec:
  credentials:
    apiKey: <API_KEY>
    appKey: <APP_KEY>
  agent:
    image:
      name: "gcr.io/datadoghq/agent:latest"

This method also works

ssrahul96 avatar Feb 12 '23 10:02 ssrahul96

Hello, this isn't really an issue with Operator, rather how finalizer work and the fact that cleanup should be done in a specific order, otherwise resources may get stuck in deletion. If Operator pod/deployment is deleted before DatadogAgent resource, latter will be marked for deletion but will wait for finalizer completion indefinitely.

I added clarification in the public documentation and closing the issue.

levan-m avatar Aug 21 '24 18:08 levan-m