crun
crun copied to clipboard
Triggering OOM leaves cgroups in bad state
Experimenting with memory limits:
"linux": {
"resources": {
"memory": {
"limit": 1048576
}
},
❯ ./crun --systemd-cgroup run test
KILLED
❯ ./crun --systemd-cgroup run test
2022-12-08T12:52:33.057298Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists
Deleting the container doesn't work:
❯ ./crun --systemd-cgroup run test
2022-12-08T12:54:46.316591Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists
Full config:
{
"ociVersion": "1.0.1",
"platform": {
"os": "linux",
"arch": "amd64"
},
"root": {
"path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
"readonly": true
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "none",
"source": "/sys",
"options": [
"rbind",
"ro",
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"process": {
"terminal": true,
"cwd": "/root",
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"args": [
"/bin/bash", "-l"
],
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL"
],
"ambient": [
"CAP_NET_BIND_SERVICE"
]
},
"noNewPrivileges": true
},
"user": {
"uid": 0,
"gid": 0
},
"hostname": "test",
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"memory": {
"limit": 1048576
}
},
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "user"
},
{
"type": "cgroup"
}
],
"uidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 1
}
],
"gidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 1
}
],
"devices": null
}
}
I think that the low memory limit causes crun itself to fail and not the container payload.
I think that the low memory limit causes crun itself to fail and not the container payload.
Right, that's what I thought too. Is that avoidable? Or should crun deal with the remnants of an previous run when starting a new container?
weird, I am not able to reproduce locally, if I specify your limit then crun works fine. If I set it lower, then I get:
2022-12-08T21:26:31.701143Z: OOM: the memory limit could be too low: read from the init process
Could you please show the output of cat /proc/self/cgroup as well as checking what processes are in the crun-test.scope cgroup?
Any useful information in systemctl --user status crun-test.scope ?
I had to lower the memory limit for this to reproduce today:
❯ ./crun --systemd-cgroup run oom_test2
❯ ./crun --systemd-cgroup run oom_test2
2022-12-09T08:36:59.464090Z: the memory limit could be too low: sd-bus call: Unit crun-oom_test2.scope was already loaded or has a fragment file.: File exists
Interestingly, the error is slightly different now, including the memory limit could be too low. The requested info:
❯ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-327.scope
❯ systemctl --user status crun-oom_test2.scope
× crun-oom_test2.scope - libcrun container
Loaded: loaded (/run/user/1000/systemd/transient/crun-oom_test2.scope; transient)
Transient: yes
Active: failed (Result: oom-kill) since Fri 2022-12-09 09:36:57 CET; 28s ago
Duration: 16ms
CPU: 15ms
Dec 09 09:36:57 taurus systemd[964]: Started libcrun container.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: A process of this unit has been killed by the OOM killer.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: Failed with result 'oom-kill'.
Also interestingly, I can't find crun-oom_test2.scope anywhere in /sys/fs/cgroup... I can find a crun-test.scope (with no processes attached to it) from when I tried this yesterday, so it seems like there's two different error cases here (one where the container gets killed and a created cgroup lingers, and one where the container dies with the memory limit could be too low and no cgroup is created but some systemd state still lingers).
If I raise the memory limit back to 1048576, I need to do something more intensive in the container, say, sh -c "find /". That does again result in an OOM kill, but not of the container process, and as such the created cgroups seem to get cleaned up fine. I guess this is the expected scenario.
With bash -c "echo 'Hello, World!'" (i.e. not using a log-in prompt) I need to further lower the memory limit, but it does seem to reproduce consistenly here with the following config:
{
"ociVersion": "1.0.1",
"platform": {
"os": "linux",
"arch": "amd64"
},
"root": {
"path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
"readonly": true
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "none",
"source": "/sys",
"options": [
"rbind",
"ro",
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"process": {
"terminal": true,
"cwd": "/root",
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"args": [
"/bin/bash", "-c", "echo 'Hello, World!'"
],
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL"
],
"ambient": [
"CAP_NET_BIND_SERVICE"
]
},
"noNewPrivileges": true
},
"user": {
"uid": 0,
"gid": 0
},
"hostname": "test",
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"memory": {
"limit": 248576
}
},
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "user"
},
{
"type": "cgroup"
}
],
"uidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 1
}
],
"gidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 1
}
],
"devices": null
}
}