containers-roadmap Support `--pids-limit` `docker run` flag

Related to #502. Without this flag, it's cumbersome to prevent fork bombs in containers.

Aug 24 '17 21:08 kristiantakvam

@kristiantakvam Thanks for opening this issue here, the pids control group is added in docker 1.11, and needs the support from kernel(require kernel > 4.3). To support this flag in ECS, we probably also need some work in ECS Optimized AMI to enable this feature in the kernel, we will track this as a feature request. In order to help us prioritize the request, can you briefly tell us your use case, what bring concern to you without this flag?

Thanks, Peng

Aug 25 '17 17:08 richardpen

@richardpen I'm working on the backend of a feature that allows users to write and execute code in their browser. The code is uploaded to our servers, inserted into a Docker container, and compiled/executed in that container. Due to the nature of this feature, I'm not in full control of the code that's running in these containers. As such, I'm having to take extra steps to make sure a malicious piece of code doesn't do any harm outside of the ephemeral container it's running in.

As I see it, there's 3 possible routes I can take for preventing a fork bomb from doing any harm outside of the container:

Set nproc in the ulimit dictionary in the Task Definition. The problem with this is that ulimits are enforced per UID in the host OS. This means if I have multiple containers running on the same host EC2 instance, if any one of those containers depletes the number of allowable processes, all the other containers are prevented from forking until the malicious container is killed. One idea I had to mitigate this, but haven't fully explored yet, is to defer creating my sandbox user until container spinup time. This way I can randomly choose an integer between 1001 and UID_MAX to use as the UID for my user so that each container is likely to have distinct UIDs. This of course doesn't provide strong guarantees of no duplicate UIDs, and due to the Birthday Problem, the probability of a dupe becomes non-zero faster than one might expect the more containers that are added per EC2 instance.
The --kernel-memory docker run flag. Not currently supported by ECS. Setting this to a low value, should restrict how much kernel memory can be used by the cgroup, effectively preventing fork bombs.
The --pids-limit docker_run flag. Not currently supported by ECS. This exactly solves my problem by putting a limit on the number of processes per cgroup.

Based on your feedback, it sounds like adding ECS support for --kernel-memory would be easier and faster since I believe kernel support for kmem.limit_in_bytes has been around for quite awhile.

Aug 25 '17 19:08 kristiantakvam

any progress on this request? CIS benchmark tests for this for which the industry refers to

Aug 14 '18 01:08 rochmind

+1, this sounds like an easy feature to support (add support for --pids-limit in runtime) , any update on this manner ?

Dec 24 '18 08:12 dannygu

fyi - here is a patched ecs-agent by remind101 - https://github.com/remind101/amazon-ecs-agent/blob/master/patches/2-pids-limit

Feb 24 '19 20:02 jedi4ever

+1

Feb 04 '20 15:02 pparth

No special use case required to be needing this except a mixed ECS cluster. One container can fork bomb the entire host, I'm looking at you JAVA containers... At which point everything is degraded and with, most of the time, very tangential or seemingly unrelated error messages because the kernel being at max_pids is not something that leads to very clear error messages at the top of the stack.

We have hit this multiple time in production and the only work around we have so far is to set the host max_pids at 4M and hope we don't run out of something else during a fork bomb. That's when system's guys start to miss the isolation of a VM/hypervisor :) Jokes aside, it would be useful to surface this in the task definition.

Jan 20 '21 00:01 hyksos

We have also had the same problem @hyksos describes, in production.

Support for --pids-limit would be appreciated.

Currently, we approximate --pids-limit by using a scheduled job to set per-task limits, i.e. something like:

Mount pids cgroup in /etc/cgconfig.conf
yum install libcgroup-tools && systemctl cgconfig enable # amzlinux2
Schedule:

LIMIT=8192
# amzlinux1
if [ -d /cgroup/pids/ecs ]; then find /cgroup/pids/ecs -mindepth 3 -maxdepth 3 -type f -name pids.max -exec sh -c "echo ${LIMIT} > {}" \; ; fi
# amzlinux2
if [ -d /sys/fs/cgroup/pids/ecs ]; then find /sys/fs/cgroup/pids/ecs -mindepth 3 -maxdepth 3 -type f -name pids.max -exec sh -c "echo ${LIMIT} > {}" \; ; fi

In addition to the per-task limits, it is also useful to set a global limit for the whole ecs tree, so that PIDs are always available to other host processes, i.e. something like this in /etc/cgconfig.conf:

group ecs {
    pids {
        pids.max = 229376; # some value less than sysctl kernel.pid_max
    }
}

Jan 20 '21 01:01 joedj

Please advise the timeline of this as this issue is opened since 4-5 years now and customers are still looking for the availability.

Sep 02 '22 05:09 singhnix

Any update on timeline or ETA on this long 6 year old issue? This is becoming a serious gap for our use case

Jun 12 '23 17:06 SpeedAndPower

This feature has been released in ECS agent version 1.77.0, thanks for your patience! https://github.com/aws/amazon-ecs-agent/releases/tag/v1.77.0

Oct 19 '23 19:10 sparrc

containers-roadmap containers-roadmap copied to clipboard

Support `--pids-limit` `docker run` flag

containers-roadmap
containers-roadmap copied to clipboard