datadog-agent
datadog-agent copied to clipboard
Agent does not start with read-only file system
Our security team asked me to make the root file system of all containers read only. But I figured out that the Datadog agent dies and is not able to run on a read only file system.
Log output
2023-01-18T09:21:53.820+01:00 | [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
2023-01-18T09:21:53.906+01:00 | [s6-init] ensuring user provided files have correct perms...exited 0.
2023-01-18T09:21:53.945+01:00 | [fix-attrs.d] applying ownership & permissions fixes...
2023-01-18T09:21:53.959+01:00 | [fix-attrs.d] done.
2023-01-18T09:21:53.959+01:00 | [cont-init.d] executing container initialization scripts...
2023-01-18T09:21:53.959+01:00 | [cont-init.d] 01-check-apikey.sh: executing...
2023-01-18T09:21:53.960+01:00 | [cont-init.d] 01-check-apikey.sh: exited 0.
2023-01-18T09:21:53.962+01:00 | [cont-init.d] 50-ci.sh: executing...
2023-01-18T09:21:53.972+01:00 | ln: failed to create symbolic link '/etc/datadog-agent/datadog.yaml': Read-only file system
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ci.sh: exited 0.
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ecs.sh: executing...
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/network.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/io.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/disk.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/load.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/memory.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.993+01:00 | [cont-init.d] 50-ecs.sh: exited 123.
2023-01-18T09:21:54.020+01:00 | [cont-finish.d] executing container finish scripts...
2023-01-18T09:21:54.022+01:00 | [cont-finish.d] done.
2023-01-18T09:21:54.023+01:00 | [s6-finish] waiting for services.
2023-01-18T09:21:54.227+01:00 | [s6-finish] sending all processes the TERM signal.
2023-01-18T09:21:57.262+01:00 | [s6-finish] sending all processes the KILL signal and exiting.
Agent Environment
I am pulling the agent from public.ecr.aws/datadog/agent:latest
. I do not see a version number in the log. I included it as a side car to my AWS ECS task definition.
Describe what happened:
After setting "readonlyRootFilesystem": true,
in the task definition, the Datadog agent isn't able to start.
Describe what you expected: Datadog agent should run as normal.
Steps to reproduce the issue:
Run the agent as a sidecar in AWS ECS. Set "readonlyRootFilesystem": true,
in your container task definition.
Additional environment details (Operating System, Cloud provider, etc): AWS ECS
Funny Im just now checking this off on my InfoSec checklist... Perfect timing?
+1 waiting for Datadog agent to work with read-only FS.
Hi @kayman-mk, @tomwire and @vyrtus15
Thanks for reporting this issue.
In order to prioritise this feature request, please contact Datadog support and link this issue.
Thanks for your comprehension. 🙇
Support contacted: https://help.datadoghq.com/hc/en-us/requests/1101939
Hi @kayman-mk, I ran into the same issue. Were you able to resolve this problem?
@clamoriniere Any news here?
The support answered on Feb 20 with:
Thanks for getting back to me. I understand this is an important feature for your organisation. I've gone ahead and created a Feature Request for this with a note of it's impact on your business. In the meantime I'm going to mark this ticket as closed as your request has been processed.
@kayman-mk
Our workaround was to docker diff the running container and get a list of all the paths that are written in the container. Then in the task definition that uses the datadog image, we added a docker volume which was configured to use those paths that came back in the docker diff. This doesnt necessarily need to be a docker volume, any would work. We only need to link /etc/datadog-agent and /opt/datadog-agent to that docker volume before locking down the root volume. I suspect people may have different paths that need to be available, but that's what worked for us.
Our agent is currently running and reporting correctly with the root volume locked.
Good solution, @tomwire, but I am a little afraid that I run into problems if I update the version of the agent and it needs a different file set than the one before.
@kayman-mk 100% agree, this is definitely the concern we have. I suspect the solution might end up being the configuration I recommended and a promise from DD that the filesystem will not be changed without proper notice. And some extra caution that our stacks are nothing alike, results may vary.
FWIW, our pipelines for our agents always grab the latest DD image, build and deploys, on a routine schedule. We haven't had any issues since and there have been updates.
I suppose a script that monitors syslog messages for permission errors on writing to files outside of the mounted volumes would save some headaches, but Im going to cross that bridge when DD breaks. I have a feeling the agents are well engineered and wont be throwing many surprises.
+1 waiting for Datadog agent to work with read-only FS.
+1
+1
+1 Other vendors are supporting this already, so waiting for the official solution by DataDog. Formal support case also entered.
+1
+1
+1
+1
+1
+1
+1
+1
Given this article https://docs.datadoghq.com/security/default_rules/cis-docker-1.2.0-5.12/ would be good to see progress on this.
I just got my agent deployed in AKS with read-only root filesystem. I am using the helm chart v3.52.0 I have readOnlyRootFilesystem enabled for initContainers, agent, process agent, and cluster agent. Not sure if this is a new feature, but might be worth it to try again for those of you who haven't checked in awhile.
I also successfully have the agent running with a read-only root filesystem. This is on ECS Fargate.
When the agent boots it tries to write configuration to /etc/datadog-agent
so you have to mount a read/write filesystem at this location. This can be done in your task definition by creating a volume and mounting it at that location in the agent container definition.
+1 Can we please prioritise this? We'd like this to be solved in the Datadog agent rather than applying the workaround mentioned above. Thank you!
+1
This docker-compose.yml
helps to test the issue locally. Working version:
services:
datadog:
image: public.ecr.aws/datadog/agent:7
environment:
- DD_API_KEY=<your_api_key>
- DD_LOGS_ENABLED=true
- DD_LOG_LEVEL=DEBUG
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup:/host/sys/fs/cgroup:ro
- datadog:/etc/datadog-agent
- datadog:/opt/datadog-agent/run
read_only: true
volumes:
datadog:
If /opt/datadog-agent
is mounted the container dies. There are references of /opt/datadog-agent/run
mount point in the codebase where the agent is running in Kubernetes cluster.