datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

[BUG] Unable to get disk metrics: [Errno 40] Too many levels of symbolic links

Open nethershaw opened this issue 1 year ago • 1 comments

Agent Environment Agent 7.47.1 - Commit: 24dcc70 - Serialization version: v5.0.90 - Go version: go1.20.6

Describe what happened: Every agent in every node in every EKS cluster I manage is spamming this message incessantly:

2023-10-24 22:25:22 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:131 in LogMessage) | disk:67cc0574430a16ba | (disk.py:136) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/15cce377ba31fc0180b1b21c21b3c5b5f00ed411e3b4f7c385898d49a2687ce4/rootfs/host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/15cce377ba31fc0180b1b21c21b3c5b5f00ed411e3b4f7c385898d49a2687ce4/rootfs/host/proc/sys/fs/binfmt_misc'. You can exclude this mountpoint in the settings if it is invalid.

Describe what you expected:

  • For disk metrics to be collected correctly without having to mess with the collector configuration
  • For the Agent to eventually give up logging so frequently about a target that it cannot access
  • For there to be some sensible way to add this pattern of directories to a denylist without excluding valuable information
  • For the Agent's default behavior in this circumstance to be sensible and not noisy

Steps to reproduce the issue:

  1. Deploy the Datadog Operator Helm Chart version 1.2.0 or later with a minimal configuration that includes node agent disk checks on k8s 1.24+ in AWS EKS or some equivalent environment with the containerd runtime.
  2. Observe node agent logs.

Additional environment details (Operating System, Cloud provider, etc): Ubuntu 22.04.03 LTS, AWS EKS, Kubernetes Server v1.28.1-eks-43840fb

Addenda https://github.com/DataDog/datadog-agent/issues/16433 was closed without comment. Don't do that. This issue duplicates it to grant visibility to the numerous reporters who have placed further information there since its closure.

nethershaw avatar Oct 24 '23 22:10 nethershaw

The same issue. I am using a pure helm chart from the official DataDog repo. The helm chart is managed by ArgoCD and I don't want to specifically modify it to exclude that mess in logs. BTW, this useless traffic probably costs some amount of our budget.

doctornkz-intelas avatar Feb 14 '24 12:02 doctornkz-intelas