tp-libvirt
tp-libvirt copied to clipboard
vm_create_destroy_concurrently: Reliability Test
The Issue
Libvirt provides unique UUIDs to every VM, in order to uniquely identify the VMs.
When transient VMs are created, libvirt allows it to have a non-unique name/UUID. So long as any other VM with the same name/UUID is a persistant VM, that is not running
If two transient VMs are created at the same time, with the same name/UUID, a race condition exists. We expect libvirt to create one VM, and prevent the creation of another. However, sometimes this is not what happens
If such a race condition occurs, Libvirt may crash, and orphan VMs may be created. This test checks for this.
The Test: vm_create_destroy_concurrently
- Create several loops that continuosly start and destroy a transient VM. Python's GIL prevents the same bytecodes from executing at the same time. To ensure that a race condition can happen, and the GIL cannot prevent it, the vms are created from an external bash script
- Wait for a given number of seconds (run_time in cfg file), stop the loops
- Make sure that libvirtd has not crashed. This is done by check to make sure the libvirt daemon(s) have the same pid at the start and end of the test
- Ensure there are no orphan VMs
Evidence the test works
Running the test results in virtqemud crashing
× virtqemud.service - libvirt QEMU daemon
Loaded: loaded (/usr/lib/systemd/system/virtqemud.service; enabled; preset: enabled)
Active: failed (Result: core-dump) since Tue 2024-07-16 15:05:41 EDT; 30min ago
Duration: 1.500s
TriggeredBy: × virtqemud-admin.socket
× virtqemud.socket
× virtqemud-ro.socket
Docs: man:virtqemud(8)
https://libvirt.org/
Process: 68934 ExecStart=/usr/sbin/virtqemud $VIRTQEMUD_ARGS (code=dumped, signal=SEGV)
Main PID: 68934 (code=dumped, signal=SEGV)
CPU: 203ms
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Scheduled restart job, restart count>
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: Stopped libvirt QEMU daemon.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Start request repeated too quickly.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Failed with result 'core-dump'.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: Failed to start libvirt QEMU daemon.