talos icon indicating copy to clipboard operation
talos copied to clipboard

[feature] enable Hot-Add CPU support and document hypervisor-specific procedures

Open vmotov opened this issue 1 month ago • 5 comments

🚀 Feature Request: Hot-Add CPU Support

Currently, Talos Linux supports hot-plugging RAM (memory) on virtual machines, specifically verified on VMware. Once RAM is added to the VM, the changes are immediately visible in the Talos dashboard, and the updated memory capacity is applied to the Kubernetes node after a Kubelet restart.

However, when additional CPU cores are added to the virtual machine (e.g., via vSphere, Proxmox, etc.), these changes do not take effect without a full host reboot.

💡 Proposed Solution

We request the addition of support to enable hot-plugging of CPU cores in Talos Linux. This feature would allow the host to recognize and utilize newly added CPU resources without requiring a complete system restart.

📝 Documentation Requirement

Along with the implementation, we need clear documentation detailing the hot-add CPU procedure for various supported hypervisors, such as:

  • VMware vSphere
  • Proxmox VE
  • KVM/libvirt
  • and any others where hot-add CPU is supported.

This would significantly improve the operational flexibility of Talos clusters running on virtualized environments.

vmotov avatar Nov 21 '25 06:11 vmotov

Looks like it requires CONFIG_HOTPLUG_CPU enabled in the kernel config. CC @dsseng

shanduur avatar Nov 21 '25 12:11 shanduur

Looks like it requires CONFIG_HOTPLUG_CPU enabled in the kernel config. CC @dsseng

we have CONFIG_HOTPLUG_CPU=y and CONFIG_ACPI_HOTPLUG_CPU=y in both AMD64 and ARM64 kernels

dsseng avatar Nov 21 '25 12:11 dsseng

This is enabled for both AMD64 and ARM64 images.

shanduur avatar Nov 21 '25 12:11 shanduur

I can try experimenting with vCPU hotplug in a QEMU VM, and memory hotplug as well. PVE and libvirt use QEMU internally, so should be the same from the guest (Talos) perspective

dsseng avatar Nov 21 '25 12:11 dsseng

echo 1 > /sys/devices/system/cpu/cpu1/online from a debug pod enables a second, hot-plugged CPU in QEMU.

So perhaps this feature request would involve machined recognizing hot-plugged CPUs and auto-enabling those. I tested it and dashboard even displays dynamically changing CPU number once I enable extra CPUs via sysfs.

~~Perhaps this can be easily done with a simplistic controller that detects hotplug events and once a new CPU is added it will automatically bring it up.~~ Or just udev rules maybe

Hot-unplug works on its own: once I unplug a CPU via QEMU QMP console both the Talos dashboard and htop stop showing that CPU.

Testing memory hotplug now: /sys/devices/system/memory/memory*/online enables new memory blocks, dashboard shows updated information as well. Unplugging doesn't seem to work, perhaps the memory has already gotten used.

QEMU QMP commands (-m size=2048,slots=2,maxmem=16G -smp 1,maxcpus=4 -qmp unix:/...,server,nowait options required):

object-add qom-type=memory-backend-ram id=mem1 size=3221225472
device_add id=dimm1 driver=pc-dimm memdev=mem1 slot=0

device_add id=cpu-1 driver=max-x86_64-cpu socket-id=0 core-id=1 thread-id=0

dsseng avatar Nov 21 '25 13:11 dsseng

Should be the following udev rule:

SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}!=1, ATTR{online}="1"

smira avatar Dec 17 '25 10:12 smira