No overcommit ratio config nor host resource reservation
Required information
- Distribution: Ubuntu
- Distribution version: 22.04
- The output of "lxc info" or if that fails:
- Kernel version:
5.15.0-67-generic #74-Ubuntu - LXD version: 5.12-c63881f
- Storage backend in use: dir
- Kernel version:
Issue description
As far as I can see, LXD may not have an idea of setting a per-host overcommit ratio for CPU/memory/disk resources.
For example, it seems that one can create a big VM exceeding host resources as long as it's within the project quota.
Let's say:
Project A: limits.cpu=200 limits.memory=800GiB Project B: limits.cpu=10 limits.memory=200GiB
and if Project A used almost all resources in the cluster, it looks like Project B can still create a VM with 200GiB on 128GiB host for example, and potentially can kill a VM from Project A or a host process based on OOM killer.
Steps to reproduce
As a minimal reproducer:
- create a VM with bigger memory than a host:
lxc launch ubuntu:jammy test-vm-1 --vm -c limits.memory=64GiBon 32GiB system for example
- check the size of recognized memory by the VM
lxc exec test-vm-1 -- free -h-> 64GiB
- run a memory allocation test in the VM
lxc exec test-vm-1 -- stress-ng -m 1 --vm-bytes 50G --timeout 60
Then OOM will be observed on the host:
Apr 06 14:42:55 t14 kernel: Out of memory: Killed process 220836 (qemu-system-x86) total-vm:69043096kB, anon-rss:51520kB, file-rss:0kB, shmem-rss:22018520kB, UID:999 pgtables:44320kB oom_score_adj:0
Marking this as something we may consider looking into later.
If you need guarantees around CPU and memory, you want to use CPU pinning and hugepages, as anything else can indeed be quite seriously overcommitted. We're not likely to find a way to prevent or even calculate an overcommit amount. That's because while it's doable for VMs, it is not for containers and as you can have both, it makes the value ultimately useless.
That said, there are some obvious things that we likely should check prior to instance startup and block starting instances that we know will immediately cause an OOM situation or the like.
Similar to https://github.com/canonical/lxd/issues/8682
@tomponline As another user potentially interested to this feature, I've noticed that it has been removed from the "later" milestone. Does this mean there are no plans for adding this feature in future releases?
As far as I know, KVM supports CPU overcommit ratios, so if LXD uses KVM under the hood I would expect LXD, too, to support this feature.
Hi, its still being considered, but there are no timelines at this point.
KVM itself does not support over-commit ratios, perhaps you are referring to another Virtual Machine Manager (VMM) that uses KVM?