datadog-agent
datadog-agent copied to clipboard
[BUG] Unable to get disk metrics: [Errno 40] Too many levels of symbolic links
Agent Environment Agent 7.47.1 - Commit: 24dcc70 - Serialization version: v5.0.90 - Go version: go1.20.6
Describe what happened: Every agent in every node in every EKS cluster I manage is spamming this message incessantly:
2023-10-24 22:25:22 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:131 in LogMessage) | disk:67cc0574430a16ba | (disk.py:136) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/15cce377ba31fc0180b1b21c21b3c5b5f00ed411e3b4f7c385898d49a2687ce4/rootfs/host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/15cce377ba31fc0180b1b21c21b3c5b5f00ed411e3b4f7c385898d49a2687ce4/rootfs/host/proc/sys/fs/binfmt_misc'. You can exclude this mountpoint in the settings if it is invalid.
Describe what you expected:
- For disk metrics to be collected correctly without having to mess with the collector configuration
- For the Agent to eventually give up logging so frequently about a target that it cannot access
- For there to be some sensible way to add this pattern of directories to a denylist without excluding valuable information
- For the Agent's default behavior in this circumstance to be sensible and not noisy
Steps to reproduce the issue:
- Deploy the Datadog Operator Helm Chart version 1.2.0 or later with a minimal configuration that includes node agent disk checks on k8s 1.24+ in AWS EKS or some equivalent environment with the containerd runtime.
- Observe node agent logs.
Additional environment details (Operating System, Cloud provider, etc): Ubuntu 22.04.03 LTS, AWS EKS, Kubernetes Server v1.28.1-eks-43840fb
Addenda https://github.com/DataDog/datadog-agent/issues/16433 was closed without comment. Don't do that. This issue duplicates it to grant visibility to the numerous reporters who have placed further information there since its closure.
The same issue. I am using a pure helm chart from the official DataDog repo. The helm chart is managed by ArgoCD and I don't want to specifically modify it to exclude that mess in logs. BTW, this useless traffic probably costs some amount of our budget.