[feature] enable Hot-Add CPU support and document hypervisor-specific procedures
🚀 Feature Request: Hot-Add CPU Support
Currently, Talos Linux supports hot-plugging RAM (memory) on virtual machines, specifically verified on VMware. Once RAM is added to the VM, the changes are immediately visible in the Talos dashboard, and the updated memory capacity is applied to the Kubernetes node after a Kubelet restart.
However, when additional CPU cores are added to the virtual machine (e.g., via vSphere, Proxmox, etc.), these changes do not take effect without a full host reboot.
💡 Proposed Solution
We request the addition of support to enable hot-plugging of CPU cores in Talos Linux. This feature would allow the host to recognize and utilize newly added CPU resources without requiring a complete system restart.
📝 Documentation Requirement
Along with the implementation, we need clear documentation detailing the hot-add CPU procedure for various supported hypervisors, such as:
- VMware vSphere
- Proxmox VE
- KVM/libvirt
- and any others where hot-add CPU is supported.
This would significantly improve the operational flexibility of Talos clusters running on virtualized environments.
Looks like it requires CONFIG_HOTPLUG_CPU enabled in the kernel config. CC @dsseng
Looks like it requires
CONFIG_HOTPLUG_CPUenabled in the kernel config. CC @dsseng
we have CONFIG_HOTPLUG_CPU=y and CONFIG_ACPI_HOTPLUG_CPU=y in both AMD64 and ARM64 kernels
I can try experimenting with vCPU hotplug in a QEMU VM, and memory hotplug as well. PVE and libvirt use QEMU internally, so should be the same from the guest (Talos) perspective
echo 1 > /sys/devices/system/cpu/cpu1/online from a debug pod enables a second, hot-plugged CPU in QEMU.
So perhaps this feature request would involve machined recognizing hot-plugged CPUs and auto-enabling those. I tested it and dashboard even displays dynamically changing CPU number once I enable extra CPUs via sysfs.
~~Perhaps this can be easily done with a simplistic controller that detects hotplug events and once a new CPU is added it will automatically bring it up.~~ Or just udev rules maybe
Hot-unplug works on its own: once I unplug a CPU via QEMU QMP console both the Talos dashboard and htop stop showing that CPU.
Testing memory hotplug now: /sys/devices/system/memory/memory*/online enables new memory blocks, dashboard shows updated information as well. Unplugging doesn't seem to work, perhaps the memory has already gotten used.
QEMU QMP commands (-m size=2048,slots=2,maxmem=16G -smp 1,maxcpus=4 -qmp unix:/...,server,nowait options required):
object-add qom-type=memory-backend-ram id=mem1 size=3221225472
device_add id=dimm1 driver=pc-dimm memdev=mem1 slot=0
device_add id=cpu-1 driver=max-x86_64-cpu socket-id=0 core-id=1 thread-id=0
Should be the following udev rule:
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}!=1, ATTR{online}="1"