integrations-core icon indicating copy to clipboard operation
integrations-core copied to clipboard

Argo CD check failing to collect certain argocd.appset_controller metrics

Open brandon-berg opened this issue 1 year ago • 5 comments

Steps to reproduce the issue:

  1. Configure Argo CD integration to collect metrics from Argo CD Application Set Controller

Describe the results you received: These metrics are not collected:

  • argocd.appset_controller.reconcile.errors.total.*
  • argocd.appset_controller.runtime.reconcile.total.*

Describe the results you expected: These metrics are not collected, although these are collected, as expected:

  • argocd.appset_controller.active.workers
  • argocd.appset_controller.max.concurrent.reconciles
  • argocd.appset_controller.reconcile.time_seconds.*

Additional information you deem important (e.g. issue happens only occasionally): This is caused by the erroneous inclusion of the counter suffix _total in the list of metrics to be collected from the Argo CD ApplicationSet Controller here. As discussed in the documentation, the "_total" suffix must be removed when specifying the name of counter metrics to be collected. As a result, these metrics cannot be collected.

Workaround In the argo integration config, add the correct metric definitions as extra_metrics:

extra_metrics:
  - controller_runtime_reconcile_errors: "reconcile.errors"
  - controller_runtime_reconcile: "runtime.reconcile"

brandon-berg avatar Jul 02 '24 15:07 brandon-berg

On our argo instance we were able to get the metrics by using a workaround to tell datadog the correct name, that this issue notes.

argo-cd:
  applicationSet:
    podAnnotations:
      ad.datadoghq.com/applicationset-controller.logs: '[{"service":"argocd","source":"argocd"}]'
      ad.datadoghq.com/applicationset-controller.checks: |
        {
          "argocd": {
            "init_config": {"service": "argocd"},
            "instances": [
              {
                "appset_controller_endpoint": "http://%%host%%:8080/metrics",
                "extra_metrics": [
                   {"controller_runtime_reconcile_errors": "reconcile.errors"},
                   {"controller_runtime_reconcile": "runtime.reconcile"}
                ]
              }
            ]
          }
        }

ericblackburn avatar Jul 02 '24 20:07 ericblackburn

@brandon-berg , were you wanting to make a PR for this or should I?

ericblackburn avatar Jul 02 '24 20:07 ericblackburn

I'm not 100% sure what the actual intended behavior was, so I'd like to leave it up to Datadog, or at least hear from them about what they actually want before submitting a PR.

brandon-berg avatar Jul 03 '24 06:07 brandon-berg

Note this is related to https://github.com/DataDog/integrations-core/pull/15308. Left a comment on the original PR issue.

ericblackburn avatar Jul 03 '24 17:07 ericblackburn

Hello 👋 Thanks for flagging! I'll put up a PR to fix this. In the meantime, although inconvenient, your proposed work arounds is what I would have recommended. Apologies there! 🙇

steveny91 avatar Jul 10 '24 13:07 steveny91

Closing this as it was released with 7.57.0 of the agent.

steveny91 avatar Sep 27 '24 21:09 steveny91