ipxe-examples icon indicating copy to clipboard operation
ipxe-examples copied to clipboard

Fix CPU compatibility problem by set cpu_mode to host-model

Open albinsun opened this issue 1 year ago • 2 comments

Changes

  1. FIX Live migration fail caused by compatibility of emulated VM CPU
    • Set libvirt.cpu_mode to compatibility-oriented host-model (default value) instead of performance-oriented host-passthrough
    • See https://libvirt.org/formatdomain.html#cpu-model-and-topology

      ... However, for backward compatibility host-model may be implemented even for domains running on emulated CPUs in which case the best CPU the hypervisor is able to emulate may be used rather then trying to mimic the host CPU model.

    • image

Issue

Ref. [BUG] Live migration fail when upgrade v1.2.1 to v1.2.2-rc2 due to virError

Guest VM live migration fail due to Harvester's CPU doesn't match specification and missing feature flag waitpkg.

image

VirtualMachineInstance migration uid 5de2134c-25e2-404e-88b2-9307f54866c8 failed. reason:
Live migration failed error encountered during MigrateToURI3 libvirt api call: 
virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: waitpkg')

Cause

Ref. https://github.com/harvester/harvester/issues/5755#issuecomment-2099660607

Some QEMU change between SLES SP4 and SP5. The issue happens when harvester nodes are in VMs and guests are in nested VMs. Here is the words from virtualization team:

but the bug is rather that you see the waitpkg flag in SP4, more than the fact that you don't see it in SP5

yes, SP5's QEMU behavior is correct, i.e., on your particular hardware, it's ok to not advertise that flag in a nested VM. It's actually SP4's QEMU that is at fault, i.e., it shouldn't advertise it in the first place, while instead it did. As I said, I can backport the fix to SP's QEMU, but this won't probably help you for that particular VM (or it would break it in even worse way, when/if the updated QEMU would reach SP4's KubeVirt)

albinsun avatar May 08 '24 03:05 albinsun

How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.

bk201 avatar May 10 '24 02:05 bk201

How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.

In this case a FAQ is necessary to assist the user on identifying the problem and the root cause that leads to the problem. Otherwise, we have a settings option that nobody knows exactly what it is for and when to use it.

votdev avatar May 13 '24 06:05 votdev