datadog-operator Duplicate SLO created for each `DatadogSLO`

Describe what happened: I've confirmed I'm only running a single Datadog operator in my K8s cluster, but it seems each DatadogSLO creates multiple SLOs in Datadog.

Running 1.3.0 of the operator, creating an example DatadogSLO:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogSLO
metadata:
  name: text-xyz
  namespace: test
spec:
  description: Error SLO for test-xyz
  name: Error SLO for test-xyz
  query:
    denominator: sum:trace.pyramid.request.hits{service:test-xyz, env:test}.as_count()
    numerator: sum:trace.pyramid.request.hits{service:test-xyz, env:test}.as_count()
      - sum:trace.pyramid.request.errors{service:test-xyz, env:test}.as_count()
  tags:
  - integration:kubernetes
  - service:test-xyz
  - env:test
  - team:sre
  - generated:kubernetes
  targetThreshold: 99500m
  timeframe: 7d
  type: metric

results in multiple SLOs being created in Datadog:

CleanShot 2024-01-31 at 11 35 23@2x

Deleting the DatadogSLO results in one of the SLOs being orphaned in Datadog.

Describe what you expected:

I expect a single DatadogSLO resource to result in a single SLO created in Datadog.

Steps to reproduce the issue:

Install the Datadog Operator via Helm (chart version 1.4.1) with following values:

datadogCRDs:
  crds:
    datadogSLOs: true
apiKeyExistingSecret: datadog-secret
appKeyExistingSecret: datadog-secret
datadogMonitor:
  enabled: true
datadogSLO:
  enabled: true
site: datadoghq.com
watchNamespaces:
- ""

Kubectl apply the example DatadogSLO above.

Additional environment details (Operating System, Cloud provider, etc):

Jan 31 '24 16:01 jeff-jsq

Hi, thanks for reporting this. We'll look into this on our end to try and see why multiple SLOs are getting created

Feb 01 '24 22:02 khewonc

I've also seen this issue using the 1.8.3 helm chart with the 1.7.0 operator.

Additionally, I was using Kyverno with a generate policy for DatadogSLOs and synchronization turned on. My target threshold was set to "99.0" and the datadog-operator controller would change it to "99", which caused Kyverno and the datadog-operator to fight back and forth changing it. The result was that I had around 40 duplicate SLOs as described in this issue. I only add all this to say that it seems that this problem gets exacerbated by updating the resource.

Jul 12 '24 14:07 paulbrassard-figure

Thanks for the reporting the issue @paulbrassard-figure!

As mentioned here the fix addressed once specific case leading to duplication - namely concurrent reconciliation of the resource. With SLO Create API not being idempotent we can't guarantee that duplication won't happen. So it would be great if you could share more details about your setup, how to reproduce the issue with Kyverno and if possible without.

Jul 25 '24 08:07 levan-m

datadog-operator datadog-operator copied to clipboard

Duplicate SLO created for each `DatadogSLO`

datadog-operator
datadog-operator copied to clipboard