datadog-agent
datadog-agent copied to clipboard
UDS socket folder is removed upon restart of the system
Version: Agent 6.14.1 - Commit: fa227f0 - Serialization version: 4.12.0 - Go version: go1.12.9
=========
Transactions
============
CheckRunsV1: 39
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 3
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 81
TimeseriesV1: 39
API Keys status
===============
API key ending with b6c4b: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- b6c4b
==========
Logs Agent
==========
Logs Agent is not running
=========
Aggregator
=========
Checks Metric Sample: 10,997
Dogstatsd Metric Sample: 3,735
Event: 1
Events Flushed: 1
Number Of Flushes: 39
Series Flushed: 10,973
Service Check: 359
Service Checks Flushed: 390
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 3,734
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 178,288
Udp Packet Reading Errors: 0
Udp Packets: 2,765
Uds Bytes: 124,736
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 813
Describe what happened:
We switched to collecting metrics via UDS socket on an EC2 linux instance and upon restart of the system, the folder containing the socket file "dsd.socket" got deleted and metrics were lost.
Describe what you expected:
We've expected that no changes would be made upon restart. But the folder containing the socket was missing and the agent did not recreate it. That turned out to lose metrics from our application.
Steps to reproduce the issue:
We set our agent to collect metrics via UDS following this guide and upon restart the instance, the /var/run/datadog folder was missing.
We set the owner/group of the folder to dd-agent which is the user that was running the datadog-agent.
Additional environment details (Operating System, Cloud provider, etc):
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
I will note that this happened to us as well. It is not wise, in my opinion, to have the configuration template default to /var/run/datadog
if the Agent will not be able to recreate the socket there.
The issue is not so much with the ephemerality of /var/run
, but rather the fact that the Agent does not recreate the file at that path when it is found not to exist. If the agent will not recreate the file in an ephemeral directory, then maybe the configuration template should mention something about choosing a path that's permanent and won't be wiped on reboot/upgrades.