Argo CD check failing to collect certain argocd.appset_controller metrics
Steps to reproduce the issue:
- Configure Argo CD integration to collect metrics from Argo CD Application Set Controller
Describe the results you received: These metrics are not collected:
- argocd.appset_controller.reconcile.errors.total.*
- argocd.appset_controller.runtime.reconcile.total.*
Describe the results you expected: These metrics are not collected, although these are collected, as expected:
- argocd.appset_controller.active.workers
- argocd.appset_controller.max.concurrent.reconciles
- argocd.appset_controller.reconcile.time_seconds.*
Additional information you deem important (e.g. issue happens only occasionally):
This is caused by the erroneous inclusion of the counter suffix _total in the list of metrics to be collected from the Argo CD ApplicationSet Controller here. As discussed in the documentation, the "_total" suffix must be removed when specifying the name of counter metrics to be collected. As a result, these metrics cannot be collected.
Workaround
In the argo integration config, add the correct metric definitions as extra_metrics:
extra_metrics:
- controller_runtime_reconcile_errors: "reconcile.errors"
- controller_runtime_reconcile: "runtime.reconcile"
On our argo instance we were able to get the metrics by using a workaround to tell datadog the correct name, that this issue notes.
argo-cd:
applicationSet:
podAnnotations:
ad.datadoghq.com/applicationset-controller.logs: '[{"service":"argocd","source":"argocd"}]'
ad.datadoghq.com/applicationset-controller.checks: |
{
"argocd": {
"init_config": {"service": "argocd"},
"instances": [
{
"appset_controller_endpoint": "http://%%host%%:8080/metrics",
"extra_metrics": [
{"controller_runtime_reconcile_errors": "reconcile.errors"},
{"controller_runtime_reconcile": "runtime.reconcile"}
]
}
]
}
}
@brandon-berg , were you wanting to make a PR for this or should I?
I'm not 100% sure what the actual intended behavior was, so I'd like to leave it up to Datadog, or at least hear from them about what they actually want before submitting a PR.
Note this is related to https://github.com/DataDog/integrations-core/pull/15308. Left a comment on the original PR issue.
Hello 👋 Thanks for flagging! I'll put up a PR to fix this. In the meantime, although inconvenient, your proposed work arounds is what I would have recommended. Apologies there! 🙇
Closing this as it was released with 7.57.0 of the agent.