datadog-operator
datadog-operator copied to clipboard
Monitors modified out-of-band are not reconcilled
Describe what happened:
I've successfully created a Monitor with the following manifest:
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMonitor
metadata:
name: hello-world-web
namespace: hello-world
spec:
query: "avg(last_5m):avg:kubernetes_state.hpa.desired_replicas{hpa:hello-world-web,kube_namespace:hello-world} / avg:kubernetes_state.hpa.max_replicas{hpa:hello-world-web,kube_namespace:hello-world} >= 1"
type: "query alert"
name: "HPA TEST"
message: "Number of replicas for HPA hello-world-web in namespace hello-world, has reached maximum threshold @target1"
tags:
- "service:hello-world"
I then used the DataDog console to edit that Monitor. I modified the message, changing @target1
to @target2
Describe what you expected:
I expected the controller to reconcile the monitor, changing @target2
back to @target1
, this never happened even though it appears the resource is being continually synced successfully:
kubectl describe datadog-monitor hello-world
Name: hello-world-web
Namespace: hello-world
Labels: <none>
Annotations: <none>
API Version: datadoghq.com/v1alpha1
Kind: DatadogMonitor
Metadata:
Creation Timestamp: 2021-12-15T12:53:41Z
Finalizers:
finalizer.monitor.datadoghq.com
Generation: 3
Managed Fields:
API Version: datadoghq.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:message:
f:name:
f:query:
f:type:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2021-12-15T12:53:41Z
API Version: datadoghq.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"finalizer.monitor.datadoghq.com":
f:spec:
f:options:
f:tags:
f:status:
.:
f:conditions:
.:
k:{"type":"Active"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:message:
f:status:
f:type:
k:{"type":"Created"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:message:
f:status:
f:type:
f:created:
f:creator:
f:currentHash:
f:downtimeStatus:
f:id:
f:monitorState:
f:monitorStateLastTransitionTime:
f:monitorStateLastUpdateTime:
f:primary:
f:syncStatus:
Manager: manager
Operation: Update
Time: 2021-12-15T12:53:41Z
Resource Version: 91886916
UID: 91c8a22f-818d-40ee-a014-dcf9f8c84a52
Spec:
Message: Number of replicas for HPA hello-world-web in namespace hello-world, has reached maximum threshold @slack-team-example
Name: [kubernetes] MARCTEST hello-world-web HPA reached max replicas
Options:
Query: avg(last_5m):avg:kubernetes_state.hpa.desired_replicas{hpa:hello-world-web,kube_namespace:hello-world} / avg:kubernetes_state.hpa.max_replicas{hpa:hello-world-web,kube_namespace:hello-world} >= 1
Tags:
service:hello-world
generated:kubernetes
Type: query alert
Status:
Conditions:
Last Transition Time: 2021-12-15T12:53:41Z
Last Update Time: 2021-12-15T13:17:41Z
Message: DatadogMonitor ready
Status: True
Type: Active
Last Transition Time: 2021-12-15T12:53:41Z
Last Update Time: 2021-12-15T12:53:41Z
Message: DatadogMonitor Created
Status: True
Type: Created
Created: 2021-12-15T12:53:41Z
Creator: redacted
Current Hash: a4ed04577b43c8b209ec6c3bb489b179
Downtime Status:
Id: 58175053
Monitor State: OK
Monitor State Last Transition Time: 2021-12-15T12:55:41Z
Monitor State Last Update Time: 2021-12-15T13:17:41Z
Primary: true
Sync Status: OK
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Create DatadogMonitor 24m DatadogMonitor hello-world/hello-world-web
A brief look at the code suggests that a hash of the specs are being compared so I'm surprised the operator isn't picking this up?
Hi @mf-lit , apologies for the delay. I just tried reproducing the issue using your example DatadogMonitor and I wasn't able to; changing @target1
to @target2
and back several times worked as expected. If you're still experiencing the issue can you check the controller logs for anything unusual? Thanks.
Hi @celenechang sorry it's taken so long to get back to you, it seems GitHub has given up on sending me notifications.
I've just tried this again, I applied the exact manifest above, then I go to the monitoring dashboard and edit the new monitor changing @target1
to @target2
. I then wait for the "Last Update Time" for the datadogmonitor
resource to update, but the monitor remains on @target2
whilst the k8s resource continues to have @target1
There's nothing remarkable in the operator log either I'm afraid. Just:
{"level":"INFO","ts":"2022-02-04T14:33:01Z","logger":"controllers.DatadogMonitor","msg":"Reconciling DatadogMonitor","datadogmonitor":"hello-world/hello-world-web"}
{"level":"INFO","ts":"2022-02-04T14:33:01Z","logger":"controllers.DatadogMonitor","msg":"Reconciling DatadogMonitor","datadogmonitor":"hello-world/hello-world-web"}
{"level":"INFO","ts":"2022-02-04T14:33:06Z","logger":"controllers.DatadogMonitor","msg":"Reconciling DatadogMonitor","datadogmonitor":"hello-world/hello-world-web"}
It's not clear why there's three of these logs in such a short space of time?
AFAIK the update is only triggered if hashed spec of the monitor differs from the hash in status. Nowhere during the reconciliation loop I see the monitors from DD are compared to k8s state.
I'm using version:
app.kubernetes.io/version: 0.8.1
helm.sh/chart: datadog-operator-0.8.5
Seeing the same thing, I was going to open an issue asking if this was desired behavior or not.
This is what I did:
Changed Delay monitor evaluation by x seconds
in the UI. Operator never changed it back.
Even with Debug logging this is all I see: {Monitor ID: 77955819, Monitor Name: test-dd-monitor-crd, Monitor Namespace: test, datadogmonitor: test/test-dd-monitor-crd, level: DEBUG, logger: controllers.DatadogMonitor, msg: Synced DatadogMonitor state, ts: 2022-07-25T15:07:32Z}
But more problematic, I tried reapplying the DatadogMonitor resource (to see if that would force it to reapply settings) and it still didn't change it! I had to modify a field of the resource config and reapply and then it changed it back.
I also tried just modifying the resources metadata.labels to see if that would trigger a sync to the config but it did not. I even tried modifying spec.labels but that also did nothing.
I needed to change the actual monitor config and apply for it to sync.
@celenechang are there any updates on this? This has prevented me from using the monitor CRD.
The monitor resource should be asserting the desired state during the reconciliation loop right?
If users can modify it in the UI and the operator won't revert those changes I can't use it. Even if I'm locking monitors it just opens up a huge troubleshooting nightmare if I can't be 100% confident the desired state is being asserted continuously.
I'm also experiencing this. I was also assuming the operator would reapply the monitor as specified in kubernetes if a change was made in the UI.
This comment seems spot on; this behavior doesn't seem to be coded.
Same here, I have created a monitor using operator and remove it after from UI console, but the reconcile didn't happened.
I also tried to force the recreate using kubectl get -o json datadogmonitor <NAME> | kubectl replace -f -
but no changes.
This behavior is the expected?
Apologies for the delay on updating the issue. It should be fixed by https://github.com/DataDog/datadog-operator/pull/791