datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Add notify_by, groupby_simple_monitor, & renotify_occurrences config options to monitors crd

Open levipe01 opened this issue 11 months ago • 1 comments

What does this PR do?

Provides the notify_by, groupby_simple_monitor & renotify_occurrences monitor options

Motivation

Closes: https://github.com/DataDog/datadog-operator/issues/833 https://github.com/DataDog/datadog-operator/issues/814

Additional Notes

groupby_simple_monitor only updates Log Monitors. renotifyOccurrences requires renotifyInterval to be configured. The unit test in monitor_test.go only tests the monitor can be built and passed to the API go client, ultimately the validation is handled by the API go client, returning a 4xx if invalid.

Minimum Agent Versions

N/A

Describe your test plan

  • Checkout branch levipe01/monitors_crd_additional_options
  • Update config/manager/manager.yaml to disable the webhook, enable Datadog monitors and apply Datadog API/App keys:
    spec:
      containers:
      - command:
        - /manager
        args:
        - --enable-leader-election
        - --pprof
        - --webhookEnabled=false
        - --datadogMonitorEnabled=true
...
        env:
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              name: "datadog-secret"
              key: api-key
        - name: DD_APP_KEY
          valueFrom:
            secretKeyRef:
              name: "datadog-secret"
              key: app-key
  • Following steps from how-to-contribute.md and using a kind cluster run:

    • make build
    • make IMG=test/operator:test IMG_CHECK=test/operator-check:test docker-build
    • kind load docker-image test/operator:test
    • kind load docker-image test/operator-check:test
    • make IMG=test/operator:test IMG_CHECK=test/operator-check:test deploy
  • Apply a DatadogAgent with the below config to generate logs:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
 name: datadog
spec:
 global:
  kubelet:
    tlsVerify: false
  clusterName: kind
  credentials:
   apiSecret:
    keyName: api-key
    secretName: datadog-secret
 features:
  logCollection:
    enabled: true
    containerCollectAll: true
  • For each config option:
    • Apply a new log monitor using the below config (subbing in each new config option separately):
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMonitor
metadata:
  name: datadog-log-alert-test
  namespace: system
spec:
  query: "logs(\"source:agent\").index(\"main\").rollup(\"count\").by(\"status,host\").last(\"1h\") > 5"
  type: "log alert"
  name: "Test log alert made from DatadogMonitor"
  message: "1-2-3 testing"
  tags:
    - "test:datadog"
  priority: 5
  options:
    renotifyInterval: 1
    renotifyOccurrences: 3
    groupbySimpleMonitor: false
    notifyBy: ["abc"]
  • For each config option, passing in an invalid setting should result in a 4xx error visible in the logs of the operator. For example the above notifyBy config should yield the below log:
{"level":"ERROR","ts":"2024-03-15T17:00:01Z","logger":"controllers.DatadogMonitor","msg":"error updating monitor","datadogmonitor":"system/datadog-log-alert-test","Monitor ID":141258523,"error":"error validating monitor: 400 Bad Request: {\"errors\":[\"Invalid notify_by value found: 'abc'; values must match the query's group keys: 'host,status'\"]}"}
  • Adding a valid config should update the below fields in the UI (visible when editing the generated monitor): 2024-03-15_13-05-40

  • Changing the configuration with another valid setting should update the monitor successfully.

  • Deleting the corresponding DatadogMonitor object should delete the monitor from the UI.

Checklist

  • [x] PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • [x] PR has a milestone or the qa/skip-qa label

levipe01 avatar Mar 15 '24 18:03 levipe01

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 58.97%. Comparing base (e0fa00d) to head (ae74c3e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1123      +/-   ##
==========================================
+ Coverage   58.95%   58.97%   +0.01%     
==========================================
  Files         174      174              
  Lines       21371    21380       +9     
==========================================
+ Hits        12600    12609       +9     
  Misses       8004     8004              
  Partials      767      767              
Flag Coverage Δ
unittests 58.97% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
apis/datadoghq/v1alpha1/datadogmonitor_types.go 100.00% <ø> (ø)
controllers/datadogmonitor/monitor.go 68.39% <100.00%> (+1.54%) :arrow_up:

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e0fa00d...ae74c3e. Read the comment docs.

codecov-commenter avatar Mar 15 '24 18:03 codecov-commenter