It seems impossible to define default cgroups settings for tasks
Summary
It seems impossible to define default cgroups settings for tasks.
Description
- We are using ECS via AWS Batch to launch multiple jobs with heavy i/o
- The heavy i/o is causing noisy neighbor issues and we would like to limit it. Specifically, Batch (and ecs-agent) fail to spin up new Docker containers, they time out do to delays caused by the heavy I/O. And this is on a Nitro SSD.
- Since there is no way to limit file i/o via AWS Batch, we wanted to configure cgroups (v1) as part of our launch configuration to accomplish this.
- The partial launch configuration is:
# Get the major/minor versions of raid drive and set the limits to 50% so one job won't block other jobs from starting MAJOR=`stat -c %t /dev/md0` MINOR=`stat -c %T /dev/md0` printf "$MAJOR:$MINOR 4000000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device > /dev/null printf "$MAJOR:$MINOR 500000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_iops_device > /dev/null printf "$MAJOR:$MINOR 2800000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > /dev/null printf "$MAJOR:$MINOR 400000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_iops_device > /dev/null sudo mkdir -p /sys/fs/cgroup/blkio/ecs printf "$MAJOR:$MINOR 4000000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_bps_device > /dev/null printf "$MAJOR:$MINOR 500000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_iops_device > /dev/null printf "$MAJOR:$MINOR 2800000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_bps_device > /dev/null printf "$MAJOR:$MINOR 400000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_iops_device > /dev/null sudo systemctl daemon-reload sudo systemctl restart docker
Expected Behavior
The launch configuration sets these values as expected, and since the ecs-agent starts jobs under the ecs cgroup, the expectation is that the above would cause all tasks to inherit these settings.
Observed Behavior
All tasks start with empty cgroups and do not inherit from the ecs parent.
If I start a task and then manually adjust the cgroups from the host that works. But since we don't know the task id / container id ahead of time, this is not a viable solution for us.
Finally, there seems to be almost no documentation available for any of this, though limiting i/o so that multiple jobs can safely run on a single node seems to be a pretty standard use case.
Environment Details
sh-4.2$ sudo docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.0.0+unknown)
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 5
Server Version: 20.10.25
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
runc version: 4bccb38cc9cf198d52bebf2b3a90cd14e7af8c06
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.336-253.554.amzn2.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 247.9GiB
Name: ip-10-30-13-103.ec2.internal
ID: Z4SU:C7UF:5AZX:UWAD:K4JH:A47W:4OIW:ATJQ:UF2W:43NX:BFBU:ZUOS
Docker Root Dir: /data
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
sh-4.2$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 124G 0 124G 0% /dev
tmpfs 124G 0 124G 0% /dev/shm
tmpfs 124G 428K 124G 1% /run
tmpfs 124G 0 124G 0% /sys/fs/cgroup
/dev/nvme0n1p1 30G 1.9G 29G 7% /
/dev/md0 6.9T 184G 6.7T 3% /data
sh-4.2$ curl http://localhost:51678/v1/metadata | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 344 100 344 0 0 91392 0 --:--:-- --:--:-- --:--:-- 111k
{
"Cluster": "cesiumion-tiling-lt-0056ce2591ed453c8-14_Batch_3c431f46-4a1d-3183-a3cb-dfe21fb7b0d4",
"ContainerInstanceArn": "arn:aws:ecs:us-east-1:899618071680:container-instance/cesiumion-tiling-lt-0056ce2591ed453c8-14_Batch_3c431f46-4a1d-3183-a3cb-dfe21fb7b0d4/833cf211563e4881bdf3baa3f669189f",
"Version": "Amazon ECS Agent - v1.80.0 (*61c8a8c5)"
}
Supporting Log Snippets
TBD
- We switched to cgroups 2 and Amazon Linux 2023, same problem.
- We are testing a workaround where we use
inotifywaitto watch cgroups and set them once ECS spins up a task. Seems to work, but definitely feels super hacky.