RFE: Support running the `boot-mirror RAID1` test on UEFI
Feature Request
This is a feature-cum-enhancement that is required to test the boot-mirror RIAD1 changes on UEFI. There's already a kola test to verify the changes on BIOS.
Desired Feature
If you run this test on UEFI, it should pass w/o any problem.
Example Usage
kola run --qemu-firmware uefi coreos.boot-mirror*
Other Information
As part of this PR, we've tried doing some investigation to support the boot-mirror kola test on UEFI, however, we'd hit a roadblock while trying to figure out the plausible way to delete the primary device. Here's the actual flow of that kola test:
- Sanity Check (includes provisioning a machine, followed by some sanity-checks)
- Detach Primary device (nuke the primary block device, set the secondary device as a primary, and then reboot)
- Verify Fallback (verify if the boot-mirror RAID changes persist after rebooting a machine)
on UEFI, we're hitting the following issue:
(qemu) device_del /machine/peripheral-anon/device[5]/virtio-backend
Error: Bus 'virtio-bus' does not support hotplugging
There are a couple of ways to delete a block device mentioned in QMP docs, but nothing seems to work out in this case.
Let's take one example:
For drive-del, you would hit the following error in kola:
failed to delete the first boot disk: Could not delete primary device d3: Running QMP command: The command drive_del has not been found
(That makes me wonder if kola really supports the latest version of QMP!?)
Having said that, If you use a manual process to run the set of commands mentioned in this comment: https://github.com/coreos/coreos-assembler/pull/1880#discussion_r566594165, you'd be able to delete the primary device successfully. So it draws me to the following conclusion:
- Whether there's a bug in the upstream code
- Or something we'd need to handle in the mantle codebase to address this issue
As a side note, we can still make the kola test work on UEFI with one hack i.e. by skipping the second step (not entirely though). In the second step, we're trying to unset the boot-index of the primary device and perform the deletion of it, and then, set the boot-index of the secondary device to 1. So in the hacky version, we'd skip the deletion part and just set the boot-index part correctly. Probably, this is not an ideal solution but thought of sharing it w/ y'all.
I'm not sure the test would provide much value if it didn't delete the primary disk, because in that case, we wouldn't be confirming that some part of the system (e.g. the bootloader) wasn't still using data from the primary disk.