proposal: Add idiomatic control groups support
We need some idiomatic way to set limits and isolate resource usage (cgroups) on Linux, but I dont want to add new keywords into the language specification for that. What I'm proposing is to encode the cgroups config into the rforkstring parameter.
Old rfork EBNF:
rforkDecl = "rfork" rforkFlags "{" program "}" .
rforkFlags = { identifier } .
New rfork EBNF:
rforkDecl = "rfork" rforkParam "{" program "}" .
rforkParam = string_lit .
As nash is a multiplataform shell, the meaning of the rforkParam is OS dependent, and then changing the BNF to it be a string turns the specification a bit more abstract for diferent OS implementations of rfork.
Below are some examples using an extended string syntax for the rfork parameter:
Given that a cgroup group called chrome already exists, we will limit the cpu usage of google-chrome with:
λ> rfork "|cpu=chrome" {
google-chrome
}
To control cpu and memory:
λ> rfork "|cpu,memory=chrome" {
google-chrome
}
Set multiples control groups:
λ> rfork "|cpu=chrome,memory=default" {
google-chrome
}
To execute a process inside a user, pid and mount namespace in addition with control groups constraints:
rfork "upm|cpu=apps,memory=default" {
/application -p 8080 -d >[1] "tcp://10.0.6.120:514"
}
Then, the Linux implementation of the rfork could interpret the rforkParam string as below:
rforkParam = nsFlags "|" [ cgConfig ] .
nsFlags = { "u" | "p" | "m" | "n" | "i" | "s" } .
cgConfig = resname "=" cgname [ "," cgConfig ] .
resname = identifier [ "," resname ] .
cgname = identifier .
I still not sure if it is a good idea. What do you guys think? Makes sense? /cc @katcipis @lborguetti @vitorarins
@tiago4orion, Ok we can put some process into a cgroup role, but how we can set limits, like 500M to google-chrome process ?
Yeah, I've talked with @katcipis and he said the same thing. My idea behind only put processes into control groups at startup time was to avoid permission denied errors because only root is capable of create control group rules. But if correctly set, an unprivileged user can put a process into a group.
Below is an implementation detail that will impact the usage of cgroups not mattering how we design the API.
On most distros the /sys/fs/cgroup is mounted by systemd on boot and the filesystem is owned by root. Then no other user can create rules. This is insane, but only one instance of the cgroup filesystem can be mounted. The user can't change the /sys/fs/cgroup/cpu permission flags of directory, but if root creates a directory called chrome inside /sys/fs/cgroup/cpu then he can change the permissions of the chrome directory and then unprivileged users can put processes into the chrome control group.. It is a very weird filesystems.
λ> sudo mkdir /sys/fs/cgroup/cpu/chrome
λ> ls /sys/fs/cgroup/cpu/chrome
cgroup.clone_children cpuacct.usage_percpu_sys cpu.cfs_quota_us
cgroup.procs cpuacct.usage_percpu_user cpu.shares
cpuacct.stat cpuacct.usage_sys cpu.stat
cpuacct.usage cpuacct.usage_user notify_on_release
cpuacct.usage_percpu cpu.cfs_period_us tasks
λ> sudo chown -R i4k.i4k /sys/fs/cgroup/cpu/chrome
Now the control group chrome can be fully controlled by user i4k:
# set the cpu of chrome group to use 25% at maximum
λ> echo "256" > /sys/fs/cgroup/cpu/chrome/cpu.shares
But you guys are sure. I'll update the issue to support specific limits settings too, but the user must be aware that to use this the script must be executed as root, or have the suid flag set, or the must be inside a root user group (as docker group).
What do you guys think of the usage below:
rfork "c|cpu=50%,mem=30%" {
firefox
}
rfork "c|chrome,mem=50%" {
# Put the browser inside the chrome group
# in addition to set the memory limit
google-chrome
}
This way nash will create (if it have the permissions) a random group with this settings and put the process inside.
@tiago4orion sounds great, just worried with the semantics of the cpu usage, 50% is 50% of overall CPU capacity of half capacity of one core ?
This setting will limit the overall CPU capacity. For example, on a multi core platform, setting a control group to only 25%, still can get 100% of CPU of a single core (in case of a 4-core processor).
Linux cgroups support others advanced cpu tunings as cpu.cfs_period_us and cpu.cfs_quota_us, but I don't think is a good idea cover it right now.
@tiago4orion nice! I think this a good way to set limits with rfork.