It seems impossible to define default cgroups settings for tasks

Open mramato opened this issue 1 year ago • 1 comments

Summary

It seems impossible to define default cgroups settings for tasks.

Description

We are using ECS via AWS Batch to launch multiple jobs with heavy i/o
The heavy i/o is causing noisy neighbor issues and we would like to limit it. Specifically, Batch (and ecs-agent) fail to spin up new Docker containers, they time out do to delays caused by the heavy I/O. And this is on a Nitro SSD.
Since there is no way to limit file i/o via AWS Batch, we wanted to configure cgroups (v1) as part of our launch configuration to accomplish this.

The partial launch configuration is:

# Get the major/minor versions of raid drive and set the limits to 50% so one job won't block other jobs from starting
MAJOR=`stat -c %t /dev/md0`
MINOR=`stat -c %T /dev/md0`

printf "$MAJOR:$MINOR  4000000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device > /dev/null
printf "$MAJOR:$MINOR  500000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_iops_device > /dev/null
printf "$MAJOR:$MINOR  2800000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > /dev/null
printf "$MAJOR:$MINOR  400000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_iops_device > /dev/null

sudo mkdir -p /sys/fs/cgroup/blkio/ecs
printf "$MAJOR:$MINOR  4000000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_bps_device > /dev/null
printf "$MAJOR:$MINOR  500000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_iops_device > /dev/null
printf "$MAJOR:$MINOR  2800000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_bps_device > /dev/null
printf "$MAJOR:$MINOR  400000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_iops_device > /dev/null

sudo systemctl daemon-reload
sudo systemctl restart docker

Expected Behavior

The launch configuration sets these values as expected, and since the ecs-agent starts jobs under the ecs cgroup, the expectation is that the above would cause all tasks to inherit these settings.

Observed Behavior

All tasks start with empty cgroups and do not inherit from the ecs parent.

If I start a task and then manually adjust the cgroups from the host that works. But since we don't know the task id / container id ahead of time, this is not a viable solution for us.

Finally, there seems to be almost no documentation available for any of this, though limiting i/o so that multiple jobs can safely run on a single node seems to be a pretty standard use case.

Environment Details

sh-4.2$ sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.0.0+unknown)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 20.10.25
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
 runc version: 4bccb38cc9cf198d52bebf2b3a90cd14e7af8c06
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.14.336-253.554.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 247.9GiB
 Name: ip-10-30-13-103.ec2.internal
 ID: Z4SU:C7UF:5AZX:UWAD:K4JH:A47W:4OIW:ATJQ:UF2W:43NX:BFBU:ZUOS
 Docker Root Dir: /data
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

sh-4.2$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        124G     0  124G   0% /dev
tmpfs           124G     0  124G   0% /dev/shm
tmpfs           124G  428K  124G   1% /run
tmpfs           124G     0  124G   0% /sys/fs/cgroup
/dev/nvme0n1p1   30G  1.9G   29G   7% /
/dev/md0        6.9T  184G  6.7T   3% /data

sh-4.2$ curl http://localhost:51678/v1/metadata | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   344  100   344    0     0  91392      0 --:--:-- --:--:-- --:--:--  111k
{
  "Cluster": "cesiumion-tiling-lt-0056ce2591ed453c8-14_Batch_3c431f46-4a1d-3183-a3cb-dfe21fb7b0d4",
  "ContainerInstanceArn": "arn:aws:ecs:us-east-1:899618071680:container-instance/cesiumion-tiling-lt-0056ce2591ed453c8-14_Batch_3c431f46-4a1d-3183-a3cb-dfe21fb7b0d4/833cf211563e4881bdf3baa3f669189f",
  "Version": "Amazon ECS Agent - v1.80.0 (*61c8a8c5)"
}

Supporting Log Snippets

TBD

Feb 01 '24 04:02 mramato

We switched to cgroups 2 and Amazon Linux 2023, same problem.
We are testing a workaround where we use inotifywait to watch cgroups and set them once ECS spins up a task. Seems to work, but definitely feels super hacky.

Feb 02 '24 00:02 mramato