datadog-agent Agent does not start with read-only file system

Our security team asked me to make the root file system of all containers read only. But I figured out that the Datadog agent dies and is not able to run on a read only file system.

Log output

2023-01-18T09:21:53.820+01:00 | [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
2023-01-18T09:21:53.906+01:00 | [s6-init] ensuring user provided files have correct perms...exited 0.
2023-01-18T09:21:53.945+01:00 | [fix-attrs.d] applying ownership & permissions fixes...
2023-01-18T09:21:53.959+01:00 | [fix-attrs.d] done.
2023-01-18T09:21:53.959+01:00 | [cont-init.d] executing container initialization scripts...
2023-01-18T09:21:53.959+01:00 | [cont-init.d] 01-check-apikey.sh: executing...
2023-01-18T09:21:53.960+01:00 | [cont-init.d] 01-check-apikey.sh: exited 0.
2023-01-18T09:21:53.962+01:00 | [cont-init.d] 50-ci.sh: executing...
2023-01-18T09:21:53.972+01:00 | ln: failed to create symbolic link '/etc/datadog-agent/datadog.yaml': Read-only file system
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ci.sh: exited 0.
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ecs.sh: executing...
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/network.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/io.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/disk.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/load.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/memory.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.993+01:00 | [cont-init.d] 50-ecs.sh: exited 123.
2023-01-18T09:21:54.020+01:00 | [cont-finish.d] executing container finish scripts...
2023-01-18T09:21:54.022+01:00 | [cont-finish.d] done.
2023-01-18T09:21:54.023+01:00 | [s6-finish] waiting for services.
2023-01-18T09:21:54.227+01:00 | [s6-finish] sending all processes the TERM signal.
2023-01-18T09:21:57.262+01:00 | [s6-finish] sending all processes the KILL signal and exiting.

Agent Environment

I am pulling the agent from public.ecr.aws/datadog/agent:latest. I do not see a version number in the log. I included it as a side car to my AWS ECS task definition.

Describe what happened: After setting "readonlyRootFilesystem": true, in the task definition, the Datadog agent isn't able to start.

Describe what you expected: Datadog agent should run as normal.

Steps to reproduce the issue: Run the agent as a sidecar in AWS ECS. Set "readonlyRootFilesystem": true, in your container task definition.

Additional environment details (Operating System, Cloud provider, etc): AWS ECS

Jan 18 '23 08:01 kayman-mk

Funny Im just now checking this off on my InfoSec checklist... Perfect timing?

Jan 20 '23 22:01 tomwire

+1 waiting for Datadog agent to work with read-only FS.

Feb 01 '23 09:02 vyrtus15

Hi @kayman-mk, @tomwire and @vyrtus15

Thanks for reporting this issue.

In order to prioritise this feature request, please contact Datadog support and link this issue.

Thanks for your comprehension. 🙇

Feb 13 '23 16:02 clamoriniere

Support contacted: https://help.datadoghq.com/hc/en-us/requests/1101939

Feb 15 '23 07:02 kayman-mk

Hi @kayman-mk, I ran into the same issue. Were you able to resolve this problem?

Jun 22 '23 21:06 maaz-nafees

@clamoriniere Any news here?

The support answered on Feb 20 with:

Thanks for getting back to me. I understand this is an important feature for your organisation. I've gone ahead and created a Feature Request for this with a note of it's impact on your business. In the meantime I'm going to mark this ticket as closed as your request has been processed.

Jun 27 '23 07:06 kayman-mk

@kayman-mk

Our workaround was to docker diff the running container and get a list of all the paths that are written in the container. Then in the task definition that uses the datadog image, we added a docker volume which was configured to use those paths that came back in the docker diff. This doesnt necessarily need to be a docker volume, any would work. We only need to link /etc/datadog-agent and /opt/datadog-agent to that docker volume before locking down the root volume. I suspect people may have different paths that need to be available, but that's what worked for us.

Our agent is currently running and reporting correctly with the root volume locked.

Jun 27 '23 18:06 tomwire

Good solution, @tomwire, but I am a little afraid that I run into problems if I update the version of the agent and it needs a different file set than the one before.

Jul 06 '23 19:07 kayman-mk

@kayman-mk 100% agree, this is definitely the concern we have. I suspect the solution might end up being the configuration I recommended and a promise from DD that the filesystem will not be changed without proper notice. And some extra caution that our stacks are nothing alike, results may vary.

FWIW, our pipelines for our agents always grab the latest DD image, build and deploys, on a routine schedule. We haven't had any issues since and there have been updates.

I suppose a script that monitors syslog messages for permission errors on writing to files outside of the mounted volumes would save some headaches, but Im going to cross that bridge when DD breaks. I have a feeling the agents are well engineered and wont be throwing many surprises.

Jul 06 '23 20:07 tomwire

+1 waiting for Datadog agent to work with read-only FS.

Aug 18 '23 18:08 thiago-youper

+1

Sep 04 '23 06:09 aayushchhabra1999

+1

Sep 15 '23 07:09 jornskjerven

+1 Other vendors are supporting this already, so waiting for the official solution by DataDog. Formal support case also entered.

Sep 27 '23 22:09 cgspohn

+1

Oct 03 '23 08:10 Siivers

+1

Oct 05 '23 15:10 danlaramay

+1

Oct 18 '23 11:10 naomichi-y

+1

Oct 18 '23 11:10 SlevinWasAlreadyTaken

+1

Oct 23 '23 04:10 h-nago

+1

Oct 24 '23 14:10 jdliauw

+1

Nov 08 '23 02:11 yokobot

+1

Nov 28 '23 15:11 rod-murphy

Given this article https://docs.datadoghq.com/security/default_rules/cis-docker-1.2.0-5.12/ would be good to see progress on this.

Dec 18 '23 01:12 marklynch

I just got my agent deployed in AKS with read-only root filesystem. I am using the helm chart v3.52.0 I have readOnlyRootFilesystem enabled for initContainers, agent, process agent, and cluster agent. Not sure if this is a new feature, but might be worth it to try again for those of you who haven't checked in awhile.

Jan 24 '24 23:01 eli-gc

I also successfully have the agent running with a read-only root filesystem. This is on ECS Fargate.

When the agent boots it tries to write configuration to /etc/datadog-agent so you have to mount a read/write filesystem at this location. This can be done in your task definition by creating a volume and mounting it at that location in the agent container definition.

Mar 05 '24 00:03 henare

+1 Can we please prioritise this? We'd like this to be solved in the Datadog agent rather than applying the workaround mentioned above. Thank you!

Apr 17 '24 22:04 jjshinobi

+1

May 22 '24 14:05 nihauc12

This docker-compose.yml helps to test the issue locally. Working version:

services:
  datadog:
    image: public.ecr.aws/datadog/agent:7
    environment:
      - DD_API_KEY=<your_api_key>
      - DD_LOGS_ENABLED=true
      - DD_LOG_LEVEL=DEBUG
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup:/host/sys/fs/cgroup:ro
      - datadog:/etc/datadog-agent
      - datadog:/opt/datadog-agent/run
    read_only: true
volumes:
  datadog:

If /opt/datadog-agent is mounted the container dies. There are references of /opt/datadog-agent/run mount point in the codebase where the agent is running in Kubernetes cluster.

Jun 19 '24 06:06 jjshinobi

datadog-agent datadog-agent copied to clipboard

Agent does not start with read-only file system

datadog-agent
datadog-agent copied to clipboard