terraform-provider-libvirt icon indicating copy to clipboard operation
terraform-provider-libvirt copied to clipboard

Remove artifacts when pool or domain creation fail

Open fmoor opened this issue 2 years ago • 5 comments

When pool or domain creation fail there may be artifacts that were created that cause rerunning the terraform plan against the same host to fail. This fixes a similar problem as described in https://github.com/dmacvicar/terraform-provider-libvirt/issues/739

System Information

Linux distribution

Ubuntu

Terraform version

$ terraform -v
Terraform v1.0.9
on linux_amd64

Provider and libvirt versions

$ terraform-provider-libvirt -version
0.6.11

fmoor avatar Oct 20 '21 22:10 fmoor

What artifacts are you referring too?

Do you have an example or sample output of this situation?

dmacvicar avatar Dec 11 '21 23:12 dmacvicar

The default qemu configuration on my laptop causes domain creation to fail with a Permision denied error. After fixing the configuration terraform apply fails with:

╷
│ Error: Error defining libvirt domain: operation failed: domain 'consul_node_2' already exists with uuid 4f3dc245-706e-4065-b075-2a25a9383ee6
│
│   with module.consul-server.libvirt_domain.consul_node[2],
│   on ../modules/consul-libvirt/consul-server.tf line 68, in resource "libvirt_domain" "consul_node":
│   68: resource "libvirt_domain" "consul_node" {
│
╵
╷
│ Error: Error defining libvirt domain: operation failed: domain 'consul_node_0' already exists with uuid cb944a87-3d95-415e-ac8c-f5a70cf4cb12
│
│   with module.consul-server.libvirt_domain.consul_node[0],
│   on ../modules/consul-libvirt/consul-server.tf line 68, in resource "libvirt_domain" "consul_node":
│   68: resource "libvirt_domain" "consul_node" {
│
╵
╷
│ Error: Error defining libvirt domain: operation failed: domain 'consul_node_1' already exists with uuid 3c5c5033-d4af-4d2c-92dd-7a55b7b4c21c
│
│   with module.consul-server.libvirt_domain.consul_node[1],
│   on ../modules/consul-libvirt/consul-server.tf line 68, in resource "libvirt_domain" "consul_node":
│   68: resource "libvirt_domain" "consul_node" {
│
╵

This is because the libvirt provider didn't clean up the domains that encountered permissions errors during creation.

$ virsh list --all
 Id   Name            State
--------------------------------
 -    consul_node_0   shut off
 -    consul_node_1   shut off
 -    consul_node_2   shut off

Running terraform destroy does not delete the domains because they were never added to the terraform state. Undefining the domains using virsh and then running terraform apply works as expected.

fmoor avatar Dec 13 '21 22:12 fmoor

So, if with this logic somebody by accident sets the same name of a running workload, creation will fail and we will both destroy and undefine this workload?

dmacvicar avatar Jan 23 '22 10:01 dmacvicar

So, if with this logic somebody by accident sets the same name of a running workload, creation will fail and we will both destroy and undefine this workload?

Name collision is detected earlier in the resource creation flow (when the xml is defined). This change only cleans up when creation fails not definition, so I don't think there is danger of destroying or undefining something that is not managed by the current terraform config.

fmoor avatar Jan 24 '22 22:01 fmoor

I've just also been hit by this, on failure to attach my libvirt guest to a network terraform bailed out by left behind the definition of a virtual machine, thus when I ran terraform apply again it resulted in a name collision.

JSmith-Aura avatar Jun 08 '23 03:06 JSmith-Aura