tp-libvirt icon indicating copy to clipboard operation
tp-libvirt copied to clipboard

vm_create_destroy_concurrently: Reliability Test

Open Slancaster1 opened this issue 7 months ago • 0 comments

The Issue

Libvirt provides unique UUIDs to every VM, in order to uniquely identify the VMs.

When transient VMs are created, libvirt allows it to have a non-unique name/UUID. So long as any other VM with the same name/UUID is a persistant VM, that is not running

If two transient VMs are created at the same time, with the same name/UUID, a race condition exists. We expect libvirt to create one VM, and prevent the creation of another. However, sometimes this is not what happens

If such a race condition occurs, Libvirt may crash, and orphan VMs may be created. This test checks for this.

The Test: vm_create_destroy_concurrently

  1. Create several loops that continuosly start and destroy a transient VM. Python's GIL prevents the same bytecodes from executing at the same time. To ensure that a race condition can happen, and the GIL cannot prevent it, the vms are created from an external bash script
  2. Wait for a given number of seconds (run_time in cfg file), stop the loops
  3. Make sure that libvirtd has not crashed. This is done by check to make sure the libvirt daemon(s) have the same pid at the start and end of the test
  4. Ensure there are no orphan VMs

Evidence the test works

Running the test results in virtqemud crashing

× virtqemud.service - libvirt QEMU daemon
     Loaded: loaded (/usr/lib/systemd/system/virtqemud.service; enabled; preset: enabled)
     Active: failed (Result: core-dump) since Tue 2024-07-16 15:05:41 EDT; 30min ago
   Duration: 1.500s
TriggeredBy: × virtqemud-admin.socket
             × virtqemud.socket
             × virtqemud-ro.socket
       Docs: man:virtqemud(8)
             https://libvirt.org/
    Process: 68934 ExecStart=/usr/sbin/virtqemud $VIRTQEMUD_ARGS (code=dumped, signal=SEGV)
   Main PID: 68934 (code=dumped, signal=SEGV)
        CPU: 203ms

Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Scheduled restart job, restart count>
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: Stopped libvirt QEMU daemon.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Start request repeated too quickly.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: virtqemud.service: Failed with result 'core-dump'.
Jul 16 15:05:41 ampere-mtsnow-altra-04.khw.eng.rdu2.dc.redhat.com systemd[1]: Failed to start libvirt QEMU daemon.

Slancaster1 avatar Jul 16 '24 19:07 Slancaster1