`Error setting CPU affinity for the instance` when creating VMs with less CPU than `$(nproc)`
lxc launch ubuntu-daily:22.04 --vm u1 causes the following to be logged by LXD (journalctl -u snap.lxd.daemon.service):
lxd.daemon[12347]: time="2024-04-09T14:17:27-04:00" level=error msg="Error setting CPU affinity for the instance" err="QEMU has less vCPUs ([56969]) than configured ([0 3 4 6 9 10 11 1 2 5 7 8])" instance=u1 project=default
First off, this message could be formatted differently as the [56969] bit seems to be one of the QEMU thread ID but it's weirdly placed in the error string. Also, the [0 3 4 6 9 10 11 1 2 5 7 8] part seems to be the list of all CPUs available on the host (12 cores CPU in my case). Those seem like candidates for CPU hotplugging, but that QEMU has an implicit limits.cpu=1 so referring to them as being configured is misleading.
Here is the taskset output after receiving this error:
$ taskset --cpu-list -a -p 56969
pid 56964's current affinity list: 0-11
pid 56965's current affinity list: 0-11
pid 56968's current affinity list: 0-11
pid 56969's current affinity list: 0-11
pid 56970's current affinity list: 0-11
pid 56972's current affinity list: 0-11
pid 56973's current affinity list: 0-11
pid 56975's current affinity list: 0-11
pid 56976's current affinity list: 0-11
pid 57069's current affinity list: 0-11
Hotplugging an additional core (with lxc config set u1 limits.cpu=2) doesn't lead to any new error message and changes taskset's output to be like:
$ taskset --cpu-list -a -p 56969
pid 56964's current affinity list: 0-11
pid 56965's current affinity list: 0-11
pid 56968's current affinity list: 0-11
pid 56969's current affinity list: 1
pid 56972's current affinity list: 0-11
pid 56973's current affinity list: 0-11
pid 56975's current affinity list: 0-11
pid 56976's current affinity list: 0-11
pid 57227's current affinity list: 3
Another thing to note is if an instance is created with limits.cpu=$(nproc), no error is logged and taskset output looks like this:
$ lxc launch ubuntu-daily:22.04 --vm u1 -c limits.cpu="$(nproc)"
$ taskset --cpu-list -a -p 59831
pid 59831's current affinity list: 0-11
pid 59832's current affinity list: 0-11
pid 59835's current affinity list: 0-11
pid 59836's current affinity list: 10
pid 59837's current affinity list: 0-11
pid 59839's current affinity list: 0-11
pid 59840's current affinity list: 0-11
pid 59842's current affinity list: 11
pid 59843's current affinity list: 0
pid 59844's current affinity list: 1
pid 59845's current affinity list: 2
pid 59846's current affinity list: 3
pid 59847's current affinity list: 5
pid 59848's current affinity list: 7
pid 59849's current affinity list: 4
pid 59850's current affinity list: 6
pid 59851's current affinity list: 8
pid 59852's current affinity list: 9
pid 59853's current affinity list: 0-11
pid 59854's current affinity list: 0-11
pid 59855's current affinity list: 0-11
pid 59856's current affinity list: 0-11
pid 59857's current affinity list: 0-11
pid 59858's current affinity list: 0-11
pid 59859's current affinity list: 0-11
pid 59860's current affinity list: 0-11
pid 59861's current affinity list: 0-11
pid 59862's current affinity list: 0-11
pid 59863's current affinity list: 0-11
pid 59864's current affinity list: 0-11
After running a few more tests, at one point I got this error:
lxd.daemon[12347]: time="2024-04-09T14:52:05-04:00" level=error msg="Error setting CPU affinity for the instance" err="QEMU has less vCPUs ([59836 59842 59843 59844 59845 59846 59847 59848 59849 59850 59851 59852]) than configured ([0 1])" instance=m1 project=default
Which happened when the VM was launched with limits.cpu=2 I think. If that's right, that looks like CPU affinity is somehow racing with the boot time CPU hotplugging or something similar.
I'm still seeing those with:
$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd git-696b610 28614 latest/edge canonical✓ -
May 08 12:44:04 sdeziel-lemur lxd.daemon[2559831]: => Starting LXD
May 08 12:44:04 sdeziel-lemur lxd.daemon[2559979]: time="2024-05-08T12:44:04-04:00" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"
May 08 12:44:07 sdeziel-lemur lxd.daemon[2559831]: => LXD is ready
May 08 12:44:13 sdeziel-lemur lxd.daemon[2559979]: time="2024-05-08T12:44:13-04:00" level=error msg="Error setting CPU affinity for the instance" err="QEMU has different count of vCPUs ([2561021]) than configured ([2 4 8 10 6 7 9 11 0 1 3 5])" instance=v1 project=default
May 08 12:44:59 sdeziel-lemur lxd.daemon[2559979]: time="2024-05-08T12:44:59-04:00" level=error msg="Error setting CPU affinity for the instance" err="QEMU has different count of vCPUs ([2561649]) than configured ([0 1 3 8 10 2 4 5 6 7 9 11])" instance=v1 project=default
May 08 12:45:04 sdeziel-lemur lxd.daemon[2559979]: time="2024-05-08T12:45:04-04:00" level=error msg="Error setting CPU affinity for the instance" err="QEMU has different count of vCPUs ([2561649]) than configured ([3 5 6 8 10 11 0 1 2 4 7 9])" instance=v1 project=default
@mihalicyn please can you investigate. Ta