datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

How to disable Infrastructure host metrics collection and enable only custom checks collection?

Open karthikeayan opened this issue 3 years ago • 5 comments

Describe what happened: Unable to deploy Datadog Container Agent as pod with only custom checks.

I have deployed Datadog Kubernetes Helm Chart in the Kubernetes cluster. Datadog created a daemonset and deployed a pod in each node and pulls metrics from each node. I also want to deploy another Datadog agent as a pod that runs only the custom checks like mysql, postgres. It should not collect metrics of the host it is running. As host metrics will be collected with Daemonset.

Host metrics are tagged to the new host with Kubernetes pod name. image

When I follow this, https://docs.datadoghq.com/logs/guide/how-to-set-up-only-logs/?tab=kubernetes, no metrics are sent to Datadog, host metric and the custom check metrics.

Describe what you expected: Host should not appear in infrastructure list.

Steps to reproduce the issue: Deploy Datadog Helm Chart Create deployment with below values

  • replica: 1
  • image: public.ecr.aws/datadog/agent:7
  • volume mounts: postgres, mysql custom check config files(postgres.d/conf.yaml, mysql.d/confg.yaml) mounting inside the container /etc/datadog-agent/conf.d

karthikeayan avatar Apr 11 '22 14:04 karthikeayan

I'm having a similar problem here, although slightly different scenario. Wanting to only capture Postgres metrics, but am finding that the agent is capturing system metrics despite removing everything else in conf.d.

ohookins avatar May 26 '22 08:05 ohookins

We are following the same idea, to have an agent running in eks to only do the rds checks.

Agent status looks good so far:

kubectl exec -it <POD_NAME> agent status

===============
Agent (v7.37.1)
===============

=========
Collector
=========

  Running Checks
  ==============
    
    postgres (12.4.0)
    -----------------
      Instance ID: postgres:6cb55c36780909a7 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/postgres.d/conf.yaml
      Total Runs: 10
      Metric Samples: Last Run: 305, Total: 2,881
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 1, Total: 13
      Database Monitoring Query Metrics: Last Run: 2, Total: 14
      Database Monitoring Query Samples: Last Run: 3, Total: 237
      Service Checks: Last Run: 1, Total: 10
      Average Execution Time : 35ms
      Last Execution Date : 2022-07-07 09:19:08 UTC (1657185548000)
      Last Successful Execution Date : 2022-07-07 09:19:08 UTC (1657185548000)
      metadata:
        version.major: 12
        version.minor: 8
        version.patch: 0
        version.raw: 12.8
        version.scheme: semver      

We also remove the standard checks with a little bit of force as there is no Variable to toggle this:

    lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "find /etc/datadog-agent/conf.d/ -iname *.yaml.default -delete"]

So far so good.

I am now trying to fix these issues from the agents logs

2022-07-07 09:16:49 UTC | CORE | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
WARNING: `--config` argument is deprecated and will be removed in a future version. Please use `--cfgpath` instead.
2022-07-07 09:16:49 UTC | PROCESS | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-07-07 09:16:49 UTC | PROCESS | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-07-07 09:16:49 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:591 in func1) | Error loading config: open /etc/datadog-agent/system-probe.yaml: no such file or directory
2022-07-07 09:16:49 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-07-07 09:16:49 UTC | SECURITY | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-07-07 09:16:51 UTC | CORE | WARN | (pkg/serializer/serializer.go:144 in NewSerializer) | event payloads are disabled: all events will be dropped
2022-07-07 09:16:51 UTC | CORE | WARN | (pkg/serializer/serializer.go:147 in NewSerializer) | series payloads are disabled: all series will be dropped
2022-07-07 09:16:51 UTC | CORE | WARN | (pkg/serializer/serializer.go:150 in NewSerializer) | service_checks payloads are disabled: all service_checks will be dropped
2022-07-07 09:16:51 UTC | CORE | WARN | (pkg/serializer/serializer.go:153 in NewSerializer) | sketches payloads are disabled: all sketches will be dropped
2022-07-07 09:16:51 UTC | CORE | WARN | (pkg/secrets/secrets.go:50 in Init) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-07-07 09:16:52 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:156 in read) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
2022-07-07 09:16:52 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:156 in read) | Skipping, open : no such file or directory
2022-07-07 09:16:52 UTC | CORE | ERROR | (pkg/collector/scheduler.go:76 in Schedule) | Unable to run Check postgres: a check with ID postgres:6cb55c36780909a7 is already running
2022-07-07 09:16:53 UTC | CORE | WARN | (pkg/util/cloudproviders/gce/gce_tags.go:50 in getCachedTags) | unable to get tags from gce and cache is empty: GCE metadata API error: status code 401 trying to GET http://169.254.169.254/computeMetadata/v1/?recursive=true
2022-07-07 09:16:53 UTC | TRACE | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
system-probe exited with code 0, disabling
trace-agent exited with code 0, disabling
2022-07-07 09:17:21 UTC | CORE | ERROR | (pkg/metrics/iterable_series.go:55 in Append) | Cannot append a serie in a closed buffered channel
2022-07-07 09:19:13 UTC | PROCESS | WARN | (pkg/util/cloudproviders/gce/gce_tags.go:50 in getCachedTags) | unable to get tags from gce and cache is empty: GCE metadata API error: status code 401 trying to GET http://169.254.169.254/computeMetadata/v1/?recursive=true
2022-07-07 09:19:36 UTC | CORE | ERROR | (pkg/metrics/iterable_series.go:55 in Append) | Cannot append a serie in a closed buffered channel

UPDATE after enabling DD_ENABLE_PAYLOADS_SERIES these errors went away

2022-07-07 09:19:36 UTC | CORE | ERROR | (pkg/metrics/iterable_series.go:55 in Append) | Cannot append a serie in a closed buffered channel

setting
- name: DD_CLOUD_PROVIDER_METADATA value: "aws" gets rid of this

2022-07-07 09:19:13 UTC | PROCESS | WARN | (pkg/util/cloudproviders/gce/gce_tags.go:50 in getCachedTags) | unable to get tags from gce and cache is empty: GCE metadata API error: status code 401 trying to GET 

liveness Probe got rid of this

2022-07-07 10:35:04 UTC | CORE | ERROR | (pkg/collector/scheduler.go:76 in Schedule) | Unable to run Check postgres: a check with ID postgres:6cb55c36780909a7 is already running

Update: setting - name: "DD_SECRET_BACKEND_COMMAND_ALLOW_GROUP_EXEC_PERM" value: "false" gets rid of these WARNS

2022-07-07 09:16:49 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec

anden-dev avatar Jul 07 '22 09:07 anden-dev

Might be more of an implementation detail, but because the datadog-agent container uses s6, there are some hooks where a user can dynamically mount shell scripts into /etc/cont-init.d which would have more of a guaranteed order of execution than what is provided by postStart:

There is no guarantee, however, that the postStart handler is called before the Container's entrypoint is called

So, the solution we took was to define a 99-delete-default-checks.sh with the same contents and mount it there.

Would it be a useful feature to consider adding this as an init script and then exposing it via a DD_DISABLE_DEFAULT_CHECKS (or something like it), environment variable?

clatour avatar Sep 29 '22 16:09 clatour

@clatour could you please explain in details what is the content of the script 99-delete-default-checks.sh and for me, I'm using a helm chart to install datadog on k8s to just scrape mysql metrics, and I'm getting unwanted k8s metrics that I need to turn off tried this but it didn't worked

  --set 'datadog.kubeStateMetricsCore.enabled=false' \
  --set 'kube-state-metrics.serviceAccount.create=false' \

mehdibenfeguir avatar Nov 09 '23 10:11 mehdibenfeguir

The system stats are collected & sent using the build-in collectors. You can run datadog-agent status to see the active collectors. It shows the config file associate with the metric collection.

Image

To disable all system collectors, go to /etc/datadog-agent/conf.d and search for all files ending .yaml or .yaml.default and delete them, while keep specific custom collectors you need.

Of course, you may need to restart the agent to pick up the new configs.

jay-w-opus avatar Oct 23 '24 19:10 jay-w-opus

This issue has been automatically marked as stale because it has not had activity in the past 15 days.

It will be closed in 30 days if no further activity occurs. If this issue is still relevant, adding a comment will keep it open. Also, you can always reopen the issue if you missed the window.

Thank you for your contributions!

dd-octo-sts[bot] avatar Oct 20 '25 10:10 dd-octo-sts[bot]