firecracker
firecracker copied to clipboard
Look into notifying the guest when vcpus are paused
According to the documentation:
KVM_KVMCLOCK_CTRL
This signals to the host kernel that the specified guest is being paused by
userspace. The host will set a flag in the pvclock structure that is checked
from the soft lockup watchdog. [...] This ioctl can be called any time
after pausing the vcpu, but before it is resumed.
KVM uses this ioctl to notify the guest that it's being paused(code):
case KVM_KVMCLOCK_CTRL: {
r = kvm_set_guest_paused(vcpu);
goto out;
}
/*
* kvm_set_guest_paused() indicates to the guest kernel that it has been
* stopped by the hypervisor. This function will be called from the host only.
* EINVAL is returned when the host attempts to set the flag for a guest that
* does not support pv clocks.
*/
static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
{
if (!vcpu->arch.pv_time_enabled)
return -EINVAL;
vcpu->arch.pvclock_set_guest_stopped_request = true;
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
return 0;
}
In the context of snapshotting a microVM and resuming it later, this ioctl needs to be evaluated to assess its benefits & costs and whether it should be called immediately after pausing or between deserialization and resuming.
Adding a bit more detail here:
In a case where the guest is paused by the hypervisor for a non-negligible period of time, that blackout period is seen by the guest as frozen/non-ticking CPUs. On resuming the VM, the guest soft-lockup watchdog might panic.
This ioctl makes KVM on the host set a special flag on the emulated/paravirtualized pv-clock. This flag is seen by the guest and tells the guest that the vCPUs haven't ticked in a while on purpose.
On resuming the VM, the guest soft-lockup watchdog might panic.
Remember seeing any such panics?
This comment in our code looks related https://github.com/firecracker-microvm/firecracker/blob/cb2ea9203da579c2a473f8538815a1e4eb4420d4/src/vmm/src/vstate/vcpu/mod.rs#L299