crun icon indicating copy to clipboard operation
crun copied to clipboard

Triggering OOM leaves cgroups in bad state

Open maleadt opened this issue 2 years ago • 4 comments
trafficstars

Experimenting with memory limits:

    "linux": {
        "resources": {
            "memory": {
                "limit": 1048576
            }
        },
❯ ./crun --systemd-cgroup run test
KILLED

❯ ./crun --systemd-cgroup run test
2022-12-08T12:52:33.057298Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists

Deleting the container doesn't work:

❯ ./crun --systemd-cgroup run test
2022-12-08T12:54:46.316591Z: sd-bus call: Unit crun-test.scope was already loaded or has a fragment file.: File exists

Full config:

{
    "ociVersion": "1.0.1",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "root": {
        "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
        "readonly": true
    },
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "ro",
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "process": {
        "terminal": true,
        "cwd": "/root",
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "args": [
            "/bin/bash", "-l"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "inheritable": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL"
            ],
            "ambient": [
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "noNewPrivileges": true
    },
    "user": {
        "uid": 0,
        "gid": 0
    },
    "hostname": "test",
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ],
            "memory": {
                "limit": 1048576
            }
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            },
            {
                "type": "cgroup"
            }
        ],
        "uidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "devices": null
    }
}

maleadt avatar Dec 08 '22 12:12 maleadt

I think that the low memory limit causes crun itself to fail and not the container payload.

giuseppe avatar Dec 08 '22 17:12 giuseppe

I think that the low memory limit causes crun itself to fail and not the container payload.

Right, that's what I thought too. Is that avoidable? Or should crun deal with the remnants of an previous run when starting a new container?

maleadt avatar Dec 08 '22 19:12 maleadt

weird, I am not able to reproduce locally, if I specify your limit then crun works fine. If I set it lower, then I get:

2022-12-08T21:26:31.701143Z: OOM: the memory limit could be too low: read from the init process

Could you please show the output of cat /proc/self/cgroup as well as checking what processes are in the crun-test.scope cgroup?

Any useful information in systemctl --user status crun-test.scope ?

giuseppe avatar Dec 08 '22 21:12 giuseppe

I had to lower the memory limit for this to reproduce today:

❯ ./crun --systemd-cgroup run oom_test2

❯ ./crun --systemd-cgroup run oom_test2
2022-12-09T08:36:59.464090Z: the memory limit could be too low: sd-bus call: Unit crun-oom_test2.scope was already loaded or has a fragment file.: File exists

Interestingly, the error is slightly different now, including the memory limit could be too low. The requested info:

❯ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-327.scope

❯ systemctl --user status crun-oom_test2.scope
× crun-oom_test2.scope - libcrun container
     Loaded: loaded (/run/user/1000/systemd/transient/crun-oom_test2.scope; transient)
  Transient: yes
     Active: failed (Result: oom-kill) since Fri 2022-12-09 09:36:57 CET; 28s ago
   Duration: 16ms
        CPU: 15ms

Dec 09 09:36:57 taurus systemd[964]: Started libcrun container.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: A process of this unit has been killed by the OOM killer.
Dec 09 09:36:57 taurus systemd[964]: crun-oom_test2.scope: Failed with result 'oom-kill'.

Also interestingly, I can't find crun-oom_test2.scope anywhere in /sys/fs/cgroup... I can find a crun-test.scope (with no processes attached to it) from when I tried this yesterday, so it seems like there's two different error cases here (one where the container gets killed and a created cgroup lingers, and one where the container dies with the memory limit could be too low and no cgroup is created but some systemd state still lingers).


If I raise the memory limit back to 1048576, I need to do something more intensive in the container, say, sh -c "find /". That does again result in an OOM kill, but not of the container process, and as such the created cgroups seem to get cleaned up fine. I guess this is the expected scenario.


With bash -c "echo 'Hello, World!'" (i.e. not using a log-in prompt) I need to further lower the memory limit, but it does seem to reproduce consistenly here with the following config:

{
    "ociVersion": "1.0.1",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "root": {
        "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
        "readonly": true
    },
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "ro",
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "process": {
        "terminal": true,
        "cwd": "/root",
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "args": [
            "/bin/bash", "-c", "echo 'Hello, World!'"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "inheritable": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL"
            ],
            "ambient": [
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "noNewPrivileges": true
    },
    "user": {
        "uid": 0,
        "gid": 0
    },
    "hostname": "test",
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ],
            "memory": {
                "limit": 248576
            }
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            },
            {
                "type": "cgroup"
            }
        ],
        "uidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "devices": null
    }
}

maleadt avatar Dec 09 '22 09:12 maleadt