datadog-agent
datadog-agent copied to clipboard
Spotty kubernetes event collection
Output of the info page (if this is a bug)
» k exec -it datadog-5lnhk agent status
Getting the status from the agent.
===============
Agent (v6.10.0)
===============
Status date: 2019-03-25 13:53:10.729646 UTC
Pid: 380
Python Version: 2.7.15
Logs:
Check Runners: 4
Log Level: WARNING
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -71µs
System UTC time: 2019-03-25 13:53:10.729646 UTC
Host Info
=========
bootTime: 2019-03-20 10:55:24.000000 UTC
kernelVersion: 4.19.25-coreos
os: linux
platform: debian
platformFamily: debian
platformVersion: buster/sid
procs: 73
uptime: 58s
virtualizationRole: guest
virtualizationSystem: kvm
Hostnames
=========
host_aliases: [redacted]
hostname: redacted
socket-fqdn: datadog-5lnhk
socket-hostname: datadog-5lnhk
hostname provider: container
unused hostname providers:
aws: not retrieving hostname from AWS: the host is not an ECS instance, and other providers already retrieve non-default hostnames
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: Get http://169.254.169.254/computeMetadata/v1/instance/hostname: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [OK]
Total Runs: 29,508
Metric Samples: Last Run: 6, Total: 177,042
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
disk (2.1.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Total Runs: 29,507
Metric Samples: Last Run: 244, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 123ms
docker
------
Instance ID: docker [OK]
Total Runs: 29,507
Metric Samples: Last Run: 317, Total: 1 M
Events: Last Run: 0, Total: 1,916
Service Checks: Last Run: 1, Total: 29,507
Average Execution Time : 48ms
file_handle
-----------
Instance ID: file_handle [OK]
Total Runs: 29,507
Metric Samples: Last Run: 5, Total: 147,535
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
io
--
Instance ID: io [OK]
Total Runs: 29,507
Metric Samples: Last Run: 130, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
kubelet (2.4.0)
---------------
Instance ID: kubelet:d884b5186b651429 [OK]
Total Runs: 29,507
Metric Samples: Last Run: 437, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 4, Total: 118,025
Average Execution Time : 427ms
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Total Runs: 29,507
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 377
Service Checks: Last Run: 5, Total: 225
Average Execution Time : 100ms
load
----
Instance ID: load [OK]
Total Runs: 29,507
Metric Samples: Last Run: 6, Total: 177,042
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
memory
------
Instance ID: memory [OK]
Total Runs: 29,507
Metric Samples: Last Run: 17, Total: 501,619
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
network (1.9.0)
---------------
Instance ID: network:2a218184ebe03606 [OK]
Total Runs: 29,508
Metric Samples: Last Run: 105, Total: 1 M
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 345ms
ntp
---
Instance ID: ntp:b4579e02d1981c12 [OK]
Total Runs: 29,507
Metric Samples: Last Run: 1, Total: 29,507
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 29,507
Average Execution Time : 0s
uptime
------
Instance ID: uptime [OK]
Total Runs: 29,508
Metric Samples: Last Run: 1, Total: 29,508
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
Transactions
============
CheckRunsV1: 29,508
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 3,718
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 62,734
TimeseriesV1: 29,508
API Keys status
===============
API key ending with e5d88: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- redacted
==========
Logs Agent
==========
docker
------
Type: docker
Status: OK
Inputs: 86865f94527467d22142d81c3fd535d4ba3c824aad2df3f7d1c12ede6b131cb5
=========
Aggregator
=========
Checks Metric Sample: 39.2 M
Dogstatsd Metric Sample: 73,768
Event: 2,294
Events Flushed: 2,294
Number Of Flushes: 29,508
Series Flushed: 33.7 M
Service Check: 502,328
Service Checks Flushed: 531,835
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 73,768
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Packet Reading Errors: 0
Udp Packets: 73,769
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
Describe what happened: Kubernetes event collection breaks shortly after an agent acquires leader lock. If I kill the agent that holds the leader lock another one acquires it, collects evens for a short while (minutes) but then it also stops reporting k8s events. I can seemingly repeat this any number of times.
Weirdly enough we have another cluster where event collection seems to work fine.
You can see in the above agent output it collected 377 events and nothing on later runs.
Describe what you expected: K8s event collection to work reliably
Steps to reproduce the issue: Datadog deployed with official helm chart (values user: here) on k8s 1.12
Additional environment details (Operating System, Cloud provider, etc): CoreOS Container Linux (latest stable), on-premises
Thanks a lot ! I'll check them out !
Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.