incus icon indicating copy to clipboard operation
incus copied to clipboard

Memory hotplug support for VMs

Open nanjj opened this issue 1 year ago • 4 comments

Is it possible for incus to support qemu memory hotplug feature defined here? The usage is similar with CPU hotplug, launch qemu instance via

-m [size=]megs[,slots=n,maxmem=size]

and change memory via QMP like:

 (qemu) object_add memory-backend-ram,id=mem1,size=1G
 (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
 (qemu) device_del dimm1
 (qemu) object_del mem1

From incus source code we can see CPU hotplug feature has been supported so I am asking why QEMU memory hotplug not supported, very confusing.

And the usage is obviously valuable.

Memory hotplug has been supported broadly by most guest OS(even the windows, almost all the versions have support for this as you may know, CPU hotplug has been supported by only windows server edition), and qemu has the support, too.

For incus we may need to add check for memory hotplug feature, give an initial slots value(maybe 2) and maxmem size(maybe 32G) , and when user set limits.memory (or maybe limits.memory.size and limits.slots) I dont know clearly, incusd using the qmp client to handle the config change.

nanjj avatar Jul 12 '24 11:07 nanjj

Yeah, that's been something we've been meaning to add for a while, but it's also a very complex one to handle right as you need to decide on the right granularity, consider NUMA nodes, handle hugepages, ...

We already have 3-4 different code paths for memory as it stands today and all of those will need to handle DIMM hotplug. The other side of this will be to know how well the OS will handle this.

For CPU we can very easily hotplug/hotremove CPUs and the OS usually handles that pretty well. For memory, hotplug should be okay, hotremove likely to be more problematic, so we may need to use ballooning for hotremove.

For now the trick you can use is start the VM with a higher allocation than needed and the reduce limits.memory which will use the memory balloon driver to shrink things.

stgraber avatar Jul 12 '24 13:07 stgraber

As for how well the OS will handle this I am using vwmare guest OS compatibility guide to check this, for linux memory hotplug as an example:https://www.vmware.com/resources/compatibility/search.php?deviceCategory=software&details=1&osFamily=2&virtualHardware=23&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc&testConfig=16

nanjj avatar Jul 13 '24 01:07 nanjj

For now the trick you can use is start the VM with a higher allocation than needed and the reduce limits.memory which will use the memory balloon driver to shrink things.

Any chance of an option to 'pre-inflate' the balloon at boot time? That would be a quick and dirty way of way of getting something equivalent to memory hotplug, up to a pre-defined limit :)

srd424 avatar Jul 28 '24 17:07 srd424

That's an option I considered but it's a bit tricky as the balloon requires a kernel driver to work properly, so it would effectively still allow the guest to consume more memory by preventing that driver from getting loaded.

That's particularly relevant when you consider multi-tenant Incus deployments where users have access to individual projects with resource limits in place. If ballooning is used to allow growing the VM memory, then one of those tenants could tweak their VM to prevent ballooning and far exceed their memory allocation.

We could still do it but would need a key like limits.memory.max which would then still be considered as used memory against a project's quota. That would far reduce its use though, so it may be best to focus on actual memory hotplug instead, even if that's a bit tricky to get right.

stgraber avatar Aug 01 '24 04:08 stgraber

This is a quite important feature i think, and something which is available in most enterprise use cases. I do think this should be properly implemented.

htcosta avatar Dec 07 '24 19:12 htcosta

Hi, I am a student at UT Austin in a virtualization course and we would like to work on this issue.

Aryan470 avatar Apr 10 '25 19:04 Aryan470

@presztak is already working on this one (he's assigned to the issue)

stgraber avatar Apr 11 '25 00:04 stgraber