gohai icon indicating copy to clipboard operation
gohai copied to clipboard

Constant "Error fetching info for pid" errors when run alongside docker containers

Open fotinakis opened this issue 9 years ago • 7 comments

Attempting to run dd-agent on a host machine which also runs docker containers, I've found that /var/log/datadog/collector.log ends up full of these logs:

2016-09-29 01:46:05 UTC | WARNING | dd.collector | checks.collector(collector.py:774) | GOHAI LOG | Error fetching info for pid 4021: user: unknown userid 9999
Error fetching info for pid 4028: user: unknown userid 9999
Error fetching info for pid 4035: user: unknown userid 9999
Error fetching info for pid 4042: user: unknown userid 9999
Error fetching info for pid 4049: user: unknown userid 9999
Error fetching info for pid 4056: user: unknown userid 9999
Error fetching info for pid 4063: user: unknown userid 9999
Error fetching info for pid 4070: user: unknown userid 9999
...

There is a user inside the docker containers with uid 9999, so I assume that this is erroring because the user does not exist on the host itself.

fotinakis avatar Sep 29 '16 01:09 fotinakis

Thanks for the feedback @fotinakis We'll work on a fix.

remh avatar Nov 18 '16 20:11 remh

Is any workaround for this? or any ETA? I know we could create user with id 9999, but it doesn't look right.

szymonpk avatar Jan 25 '17 11:01 szymonpk

Also experiencing this issue.

creatorzim avatar Feb 22 '17 00:02 creatorzim

This issue may not be limited to docker. I'm seeing the same type of error with datadog+lxc containers.

Mar 18 06:05:15 127.0.0.1 dd.collector[180994]: WARNING (collector.py:774): GOHAI LOG | Error fetching info for pid 22462: user: unknown userid 998
Error fetching info for pid 24647: user: unknown userid 121
Error fetching info for pid 24649: user: unknown userid 121
Error fetching info for pid 24650: user: unknown userid 121
Error fetching info for pid 24651: user: unknown userid 121
Error fetching info for pid 24652: user: unknown userid 121
Error fetching info for pid 24653: user: unknown userid 121
Error fetching info for pid 24654: user: unknown userid 10000
Error fetching info for pid 24681: user: unknown userid 10000
Error fetching info for pid 24684: user: unknown userid 10000
Error fetching info for pid 24687: user: unknown userid 10000
Error fetching info for pid 24688: user: unknown userid 10000
Error fetching info for pid 24689: user: unknown userid 10000
Error fetching info for pid 24690: user: unknown userid 10000
Error fetching info for pid 24697: user: unknown userid 10000
Error fetching info for pid 24698: user: unknown userid 10000
Error fetching info for pid 24699: user: unknown userid 10000
Error fetching info for pid 24700: user: unknown userid 10000
Error fetching info for pid 24701: user: unknown userid 10000
...
# dpkg -l | egrep 'docker|lxc|datadog'
ii  datadog-agent                      1:5.9.1-1                       amd64        Datadog Monitoring Agent
ii  liblxc1                            2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (library)
ii  lxc                                2.0.7-0ubuntu1~16.04.2          all          Transitional package for lxc1
ii  lxc-common                         2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (common tools)
ii  lxc-templates                      2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (templates)
ii  lxc1                               2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools
ii  python-lxc                         0.1-0ubuntu6                    amd64        Linux container userspace tools (Python 2.x bindings)
ii  python3-lxc                        2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (Python 3.x bindings)

A workaround would be welcome.

nzlosh avatar Mar 18 '17 12:03 nzlosh

@nzlosh Creating user with specific id helps. However, I would expect fix from DD.

szymonpk avatar Mar 20 '17 05:03 szymonpk

This bug has apparently been carried over to version 1.6 of datadog-agent. There must be a workaround but I would expect it to be posted in a comment on this open issue. Will continue googling.

parity3 avatar Aug 03 '18 22:08 parity3

Same issue on kubernetes using datadog helm chart

values.yml # Default values for datadog. image: # This chart is compatible with different images, please choose one #repository: datadog/agent # Agent6 repository: datadog/dogstatsd # Standalone DogStatsD6 tag: 6.9.0 # Use 6.9.0-jmx to enable jmx fetch collection pullPolicy: IfNotPresent ## It is possible to specify docker registry credentials ## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod # pullSecrets: # - name: regsecret # NB! Normally you need to keep Datadog DaemonSet enabled! # The exceptional case could be a situation when you need to run # single DataDog pod per every namespace, but you do not need to # re-create a DaemonSet for every non-default namespace install. # Note, that StatsD and DogStatsD work over UDP, so you may not # get guaranteed delivery of the metrics in Datadog-per-namespace setup! daemonset: enabled: false ## Bind ports on the hostNetwork. Useful for CNI networking where hostPort might ## not be supported. The ports will need to be available on all hosts. Can be ## used for custom metrics instead of a service endpoint. ## WARNING: Make sure that hosts using this are properly firewalled otherwise ## metrics and traces will be accepted from any host able to connect to this host. # useHostNetwork: true ## Sets the hostPort to the same value of the container port. Needs to be used ## to receive traces in a standard APM set up. Can be used as for sending custom metrics. ## The ports will need to be available on all hosts. ## WARNING: Make sure that hosts using this are properly firewalled otherwise ## metrics and traces will be accepted from any host able to connect to this host. # useHostPort: true ## Run the agent in the host's PID namespace. This is required for Dogstatsd origin ## detection to work. See https://docs.datadoghq.com/developers/dogstatsd/unix_socket/ useHostPID: true ## Annotations to add to the DaemonSet's Pods # podAnnotations: # scheduler.alpha.kubernetes.io/tolerations: '[{"key": "example", "value": "foo"}]' ## Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6) # tolerations: [] ## Allow the DaemonSet to schedule on selected nodes # Ref: https://kubernetes.io/docs/user-guide/node-selection/ # nodeSelector: {} ## Allow the DaemonSet to schedule ussing affinity rules # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity # affinity: {} ## Allow the DaemonSet to perform a rolling update on helm update ## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ # updateStrategy: RollingUpdate ## Sets PriorityClassName if defined # priorityClassName: # Apart from DaemonSet, deploy Datadog agent pods and related service for # applications that want to send custom metrics. Provides DogStasD service. # # HINT: If you want to use datadog.collectEvents, keep deployment.replicas set to 1. deployment: enabled: true replicas: 1 # Affinity for pod assignment # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} # Tolerations for pod assignment # Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ tolerations: [] # If you're using a NodePort-type service and need a fixed port, set this parameter. # dogstatsdNodePort: 8125 # traceNodePort: 8126 service: type: ClusterIP annotations: {} ## Sets PriorityClassName if defined # priorityClassName: ## deploy the kube-state-metrics deployment ## ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics kubeStateMetrics: enabled: false rbac: create: false # This is the new cluster agent implementation that handles cluster-wide # metrics more cleanly, separates concerns for better rbac, and implements # the external metrics API so you can autoscale HPAs based on datadog # metrics clusterAgent: containerName: cluster-agent image: repository: datadog/cluster-agent tag: 1.1.0 pullPolicy: IfNotPresent enabled: false ## This needs to be at least 32 characters a-zA-z ## It is a preshared key between the node agents and the cluster agent token: "" replicas: 1 ## Enable the metricsProvider to be able to scale based on metrics in Datadog metricsProvider: enabled: false resources: requests: cpu: 200m memory: 256Mi limits: cpu: 200m memory: 256Mi ## Override the agent's liveness probe logic from the default: ## In case of issues with the probe, you can disable it with the ## following values, to allow easier investigating: # livenessProbe: # exec: # command: ["/bin/true"] ## Override the cluster-agent's readiness probe logic from the default: # readinessProbe: datadog: ## You'll need to set this to your Datadog API key before the agent will run. ## ref: https://app.datadoghq.com/account/settings#agent/kubernetes ## apiKey: xxxxxxxxxx ## You can modify the security context used to run the containers by ## modifying the label type below: # securityContext: # seLinuxOptions: # seLinuxLabel: "spc_t" ## Use existing Secret which stores API key instead of creating a new one # apiKeyExistingSecret: ## If you are using clusterAgent.metricsProvider.enabled = true, you'll need ## a datadog app key for read access to the metrics # appKey: ## Use existing Secret which stores APP key instead of creating a new one # appKeyExistingSecret: ## Daemonset/Deployment container name ## See clusterAgent.containerName if clusterAgent.enabled = true ## name: datadog # The site of the Datadog intake to send Agent data to. # Defaults to 'datadoghq.com', set to 'datadoghq.eu' to send data to the EU site. # site: datadoghq.com # The host of the Datadog intake server to send Agent data to, only set this option # if you need the Agent to send data to a custom URL. # Overrides the site setting defined in "site". # dd_url: https://app.datadoghq.com ## Set logging verbosity. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## Note: For Agent6 (image `datadog/agent`) the valid log levels are ## trace, debug, info, warn, error, critical, and off ## logLevel: INFO ## Un-comment this to make each node accept non-local statsd traffic. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## # nonLocalTraffic: true ## Enable container runtime socket volume mounting useCriSocketVolume: true ## Set host tags. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## # tags: ## Enables event collection from the kubernetes API ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## collectEvents: false ## Enables log collection ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup ## # logsEnabled: false # logsConfigContainerCollectAll: false ## Un-comment this to enable APM and tracing, on port 8126 ## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host ## # apmEnabled: true ## Un-comment this to enable live process monitoring ## ref: https://docs.datadoghq.com/graphing/infrastructure/process/#kubernetes-daemonset ## # processAgentEnabled: true ## The dd-agent supports many environment variables ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#environment-variables ## # env: # - name: # value: ## The dd-agent supports detailed process and container monitoring and ## requires control over the volume and volumeMounts for the daemonset ## or deployment. ## ref: https://docs.datadoghq.com/guides/process/ ## # volumes: # - hostPath: # path: /etc/passwd # name: passwd # volumeMounts: # - name: passwd # mountPath: /etc/passwd # readOnly: true ## Enable leader election mechanism for event collection ## # leaderElection: false ## Set the lease time for leader election ## # leaderLeaseDuration: 600 ## Provide additional check configurations (static and Autodiscovery) ## Each key will become a file in /conf.d ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes ## ref: https://docs.datadoghq.com/agent/autodiscovery/ ## # confd: # redisdb.yaml: |- # init_config: # instances: # - host: "name" # port: "6379" # kubernetes_state.yaml: |- # ad_identifiers: # - kube-state-metrics # init_config: # instances: # - kube_state_url: http://%%host%%:8080/metrics ## Provide additional custom checks as python code ## Each key will become a file in /checks.d ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes ## # checksd: # service.py: |- ## Path to the container runtime socket (if different from Docker) ## This is supported starting from agent 6.6.0 # criSocketPath: /var/run/containerd/containerd.sock ## Provide a mapping of Kubernetes Labels to Datadog Tags # podLabelsAsTags: # app: kube_app # release: helm_release ## Provide a mapping of Kubernetes Annotations to Datadog Tags # podAnnotationsAsTags: # iam.amazonaws.com/role: kube_iamrole ## Override the agent's liveness probe logic from the default: ## In case of issues with the probe, you can disable it with the ## following values, to allow easier investigating: # livenessProbe: # exec: # command: ["/bin/true"] ## datadog-agent resource requests and limits ## Make sure to keep requests and limits equal to keep the pods in the Guaranteed QoS class ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## resources: requests: cpu: 200m memory: 256Mi limits: cpu: 200m memory: 256Mi rbac: ## If true, create & use RBAC resources create: false ## Ignored if rbac.create is true serviceAccountName: default tolerations: [] kube-state-metrics: rbac: create: false ## Ignored if rbac.create is true serviceAccountName: default
error Output
│ 2019-02-06 12:37:27 UTC | INFO | (log.go:473 in func1) | Config will be read from env variables                                                                                                                                                                              │
│ 2019-02-06 12:37:27 UTC | INFO | (forwarder.go:154 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-9-0-app.agent.datadoghq.com" (1 api key(s))                                                                                     │
│ 2019-02-06 12:37:28 UTC | INFO | (processes.go:16 in getProcesses) | Error fetching info for pid 1: user: lookup userid 0: no such file or directory                                                                                                                         │
│ 2019-02-06 12:37:28 UTC | ERROR | (gohai.go:40 in getGohaiInfo) | Failed to retrieve filesystem metadata: df failed to collect filesystem data: exit status 1                                                                                                                │
│ 2019-02-06 12:37:28 UTC | INFO | (serializer.go:246 in SendMetadata) | Sent host metadata payload, size: 1857 bytes.                                                                                                                                                         │
│ 2019-02-06 12:37:28 UTC | INFO | (udp.go:76 in Listen) | dogstatsd-udp: starting to listen on :8125                                                                                                                                                                          │
│ 2019-02-06 12:37:28 UTC | INFO | (transaction.go:193 in Process) | Successfully posted payload to "https://6-9-0-app.agent.datadoghq.com/intake/?api_key=*************************xxxxx", the agent will only log transaction success every 20 transactions                  │
│ 2019-02-06 12:38:01 UTC | INFO | (main.go:219 in start) | See ya!      

And the pod has the status CrashLoopBackOff

eraac avatar Feb 06 '19 12:02 eraac

Causes lots of noise in DataDog otel exporter see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/14186 .

ringerc avatar Oct 26 '23 22:10 ringerc

Hello, this repository has been archived, the gohai library now lives in https://github.com/DataDog/datadog-agent/pkg/gohai, feel free to re-open your issue in https://github.com/DataDog/datadog-agent if it is still relevant.

pgimalac avatar Nov 14 '23 18:11 pgimalac