Fix CPU compatibility problem by set cpu_mode to host-model
Changes
- FIX Live migration fail caused by compatibility of emulated VM CPU
- Set libvirt.cpu_mode to compatibility-oriented
host-model(default value) instead of performance-orientedhost-passthrough - See https://libvirt.org/formatdomain.html#cpu-model-and-topology
... However, for backward compatibility host-model may be implemented even for domains running on emulated CPUs in which case the best CPU the hypervisor is able to emulate may be used rather then trying to mimic the host CPU model.
- Set libvirt.cpu_mode to compatibility-oriented
Issue
Ref. [BUG] Live migration fail when upgrade v1.2.1 to v1.2.2-rc2 due to virError
Guest VM live migration fail due to Harvester's CPU doesn't match specification and missing feature flag waitpkg.
VirtualMachineInstance migration uid 5de2134c-25e2-404e-88b2-9307f54866c8 failed. reason:
Live migration failed error encountered during MigrateToURI3 libvirt api call:
virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: waitpkg')
Cause
Ref. https://github.com/harvester/harvester/issues/5755#issuecomment-2099660607
Some QEMU change between SLES SP4 and SP5. The issue happens when harvester nodes are in VMs and guests are in nested VMs. Here is the words from virtualization team:
but the bug is rather that you see the waitpkg flag in SP4, more than the fact that you don't see it in SP5
yes, SP5's QEMU behavior is correct, i.e., on your particular hardware, it's ok to not advertise that flag in a nested VM. It's actually SP4's QEMU that is at fault, i.e., it shouldn't advertise it in the first place, while instead it did. As I said, I can backport the fix to SP's QEMU, but this won't probably help you for that particular VM (or it would break it in even worse way, when/if the updated QEMU would reach SP4's KubeVirt)
How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.
How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.
In this case a FAQ is necessary to assist the user on identifying the problem and the root cause that leads to the problem. Otherwise, we have a settings option that nobody knows exactly what it is for and when to use it.