datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

UDS socket folder is removed upon restart of the system

Open juan88 opened this issue 4 years ago • 1 comments

Version: Agent 6.14.1 - Commit: fa227f0 - Serialization version: 4.12.0 - Go version: go1.12.9
=========
  Transactions
  ============
    CheckRunsV1: 39
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 3
    Metadata: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    SketchSeries: 0
    Success: 81
    TimeseriesV1: 39

  API Keys status
  ===============
    API key ending with b6c4b: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - b6c4b

==========
Logs Agent
==========

  Logs Agent is not running

=========
Aggregator
=========
  Checks Metric Sample: 10,997
  Dogstatsd Metric Sample: 3,735
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 39
  Series Flushed: 10,973
  Service Check: 359
  Service Checks Flushed: 390

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 3,734
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 178,288
  Udp Packet Reading Errors: 0
  Udp Packets: 2,765
  Uds Bytes: 124,736
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 813

Describe what happened:

We switched to collecting metrics via UDS socket on an EC2 linux instance and upon restart of the system, the folder containing the socket file "dsd.socket" got deleted and metrics were lost.

Describe what you expected:

We've expected that no changes would be made upon restart. But the folder containing the socket was missing and the agent did not recreate it. That turned out to lose metrics from our application.

Steps to reproduce the issue:

We set our agent to collect metrics via UDS following this guide and upon restart the instance, the /var/run/datadog folder was missing.

We set the owner/group of the folder to dd-agent which is the user that was running the datadog-agent.

Additional environment details (Operating System, Cloud provider, etc):

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"

juan88 avatar Jun 20 '20 17:06 juan88

I will note that this happened to us as well. It is not wise, in my opinion, to have the configuration template default to /var/run/datadog if the Agent will not be able to recreate the socket there.

The issue is not so much with the ephemerality of /var/run, but rather the fact that the Agent does not recreate the file at that path when it is found not to exist. If the agent will not recreate the file in an ephemeral directory, then maybe the configuration template should mention something about choosing a path that's permanent and won't be wiped on reboot/upgrades.

sha1sum avatar Mar 12 '24 22:03 sha1sum