datadog-operator
datadog-operator copied to clipboard
Finalizers block deletion of namespace and agent
Describe what happened: When deleting the namespace, a race condition between the agent finalizers and the namespace termination happened blocking the deletion of both.
namespace description:
k describe ns ...
Name: ...
Labels: kubernetes.io/metadata.name=...
Annotations: <none>
Status: Terminating
Conditions:
Type Status LastTransitionTime Reason Message
---- ------ ------------------ ------ -------
NamespaceDeletionDiscoveryFailure False Wed, 30 Nov 2022 12:00:36 -0500 ResourcesDiscovered All resources successfully discovered
NamespaceDeletionGroupVersionParsingFailure False Wed, 30 Nov 2022 12:00:36 -0500 ParsedGroupVersions All legacy kube types successfully parsed
NamespaceDeletionContentFailure False Wed, 30 Nov 2022 12:01:22 -0500 ContentDeleted All content successfully deleted, may be waiting on finalization
NamespaceContentRemaining True Wed, 30 Nov 2022 12:00:36 -0500 SomeResourcesRemain Some resources are remaining: datadogagents.datadoghq.com has 1 resource instances
NamespaceFinalizersRemaining True Wed, 30 Nov 2022 12:00:36 -0500 SomeFinalizersRemain Some content in the namespace has finalizers remaining: finalizer.agent.datadoghq.com in 1 resource instances
When doing a describe on the agent:
...
k describe DatadogAgent datadog
(...)
Conditions:
(...)
Last Transition Time: 2022-11-30T17:00:34Z
Message: services "datadog-cluster-agent" is forbidden: unable to create new content in namespace ... because it is being terminated
Status: True
Type: ReconcileError
Events: <none>
tentative to delete the agent by patching the finalizer
k patch datadogagents.datadoghq.com datadog -p '{"metadata":{"finalizers":null}}' --type=merge
The DatadogAgent "datadog" is invalid: status.conditions.reason: Required value
I was able to finally delete it by editing the agent crd directly and removing the mandatory state on the status.conditions.reason
field from it
@fifoome , I could not find the status.conditions.reason
field, in k describe
. could you share how you deleted?
the error i am getting
k patch DatadogAgent datadog -n ddoperator -p '{"metadata":{"finalizers":null}}' --type=merge
The DatadogAgent "datadog" is invalid:
* status.conditions[0].reason: Required value
* status.conditions[1].reason: Required value
* status.conditions[2].message: Required value
* status.conditions[2].reason: Required value
@ssrahul96 I accomplished the same thing by deleting the operator, and then editing the CRD and removing the message
and reason
(line 8426 and 8427 for me) attributes from the required
array.
kubectl delete deployment datadog-operator
kubectl edit crd datadogagents.datadoghq.com
kubectl patch datadogagent/datadog --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
one more solution from datadog support
Deploy DatadogAgent
again with empty
finalizers
apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
name: datadog
finalizers: []
spec:
credentials:
apiKey: <API_KEY>
appKey: <APP_KEY>
agent:
image:
name: "gcr.io/datadoghq/agent:latest"
This method also works
Hello, this isn't really an issue with Operator, rather how finalizer work and the fact that cleanup should be done in a specific order, otherwise resources may get stuck in deletion. If Operator pod/deployment is deleted before DatadogAgent resource, latter will be marked for deletion but will wait for finalizer completion indefinitely.
I added clarification in the public documentation and closing the issue.