lxd icon indicating copy to clipboard operation
lxd copied to clipboard

No overcommit ratio config nor host resource reservation

Open nobuto-m opened this issue 2 years ago • 2 comments

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • The output of "lxc info" or if that fails:
    • Kernel version: 5.15.0-67-generic #74-Ubuntu
    • LXD version: 5.12-c63881f
    • Storage backend in use: dir

Issue description

As far as I can see, LXD may not have an idea of setting a per-host overcommit ratio for CPU/memory/disk resources.

For example, it seems that one can create a big VM exceeding host resources as long as it's within the project quota.

Let's say:

Project A: limits.cpu=200 limits.memory=800GiB Project B: limits.cpu=10 limits.memory=200GiB

and if Project A used almost all resources in the cluster, it looks like Project B can still create a VM with 200GiB on 128GiB host for example, and potentially can kill a VM from Project A or a host process based on OOM killer.

Steps to reproduce

As a minimal reproducer:

  1. create a VM with bigger memory than a host:
    • lxc launch ubuntu:jammy test-vm-1 --vm -c limits.memory=64GiB on 32GiB system for example
  2. check the size of recognized memory by the VM
    • lxc exec test-vm-1 -- free -h -> 64GiB
  3. run a memory allocation test in the VM
    • lxc exec test-vm-1 -- stress-ng -m 1 --vm-bytes 50G --timeout 60

Then OOM will be observed on the host: Apr 06 14:42:55 t14 kernel: Out of memory: Killed process 220836 (qemu-system-x86) total-vm:69043096kB, anon-rss:51520kB, file-rss:0kB, shmem-rss:22018520kB, UID:999 pgtables:44320kB oom_score_adj:0

nobuto-m avatar Apr 06 '23 06:04 nobuto-m

Marking this as something we may consider looking into later.

If you need guarantees around CPU and memory, you want to use CPU pinning and hugepages, as anything else can indeed be quite seriously overcommitted. We're not likely to find a way to prevent or even calculate an overcommit amount. That's because while it's doable for VMs, it is not for containers and as you can have both, it makes the value ultimately useless.

That said, there are some obvious things that we likely should check prior to instance startup and block starting instances that we know will immediately cause an OOM situation or the like.

stgraber avatar Apr 11 '23 13:04 stgraber

Similar to https://github.com/canonical/lxd/issues/8682

tomponline avatar Sep 03 '24 22:09 tomponline

@tomponline As another user potentially interested to this feature, I've noticed that it has been removed from the "later" milestone. Does this mean there are no plans for adding this feature in future releases?

As far as I know, KVM supports CPU overcommit ratios, so if LXD uses KVM under the hood I would expect LXD, too, to support this feature.

ar406 avatar Sep 30 '25 11:09 ar406

Hi, its still being considered, but there are no timelines at this point.

KVM itself does not support over-commit ratios, perhaps you are referring to another Virtual Machine Manager (VMM) that uses KVM?

tomponline avatar Sep 30 '25 11:09 tomponline