terraform-provider-libvirt
terraform-provider-libvirt copied to clipboard
"wait_for_lease = true" does not take effect
System Information
Linux distribution
Archlinux
Terraform version
terraform -v
Terraform v1.9.4
on linux_amd64
Provider and libvirt versions
terraform-provider-libvirt -version
0.7.6
Description of Issue/Question
I use the following simple configuration. It installs qemu-ga through cloud-init. When using terraform-provider-libvirt 0.7.6 version, "qemu_agent = true", "wait_for_lease = true" will not wait for qemu-ga to obtain the IP, prompting "Error: couldn't retrieve IP address". Only after changing the version to 0.7.1, try terraform init -upgrade, and then apply, it will wait for qemu-ga to obtain the IP.
Please excuse my poor English.
Setup
this is my main.tf
terraform {
required_version = ">= 0.13"
required_providers {
libvirt = {
source = "dmacvicar/libvirt"
version = "0.7.6"
}
}
}
provider "libvirt" {
uri = "qemu:///system"
}
data "template_file" "user_data" {
template = file("${path.module}/cloud_init/cloud_init.yml")
}
data "template_file" "network_config" {
template = file("${path.module}/cloud_init/network_config.yml")
}
resource "libvirt_cloudinit_disk" "cloudinit" {
name = "cloudinit.iso"
user_data = data.template_file.user_data.rendered
network_config = data.template_file.network_config.rendered
pool = "default"
}
resource "libvirt_volume" "debian9-qcow2" {
name = "debian9-qcow2"
pool = "default"
source = "./ubuntu-24.04-server-cloudimg-amd64.img"
}
// set boot order hd, network
resource "libvirt_domain" "domain-debian9-qcow2" {
name = "debian9"
memory = "1024"
vcpu = 1
qemu_agent = true
cloudinit = libvirt_cloudinit_disk.cloudinit.id
network_interface {
bridge = "br0"
wait_for_lease = true
}
boot_device {
dev = ["hd", "network"]
}
disk {
volume_id = libvirt_volume.debian9-qcow2.id
}
graphics {
type = "spice"
listen_type = "address"
autoport = true
}
provisioner "remote-exec" {
inline = [
<<-EOF
sudo apt-get update
sudo apt-get install nginx -y
EOF
]
}
connection {
type = "ssh"
user = "ubuntu"
host = self.network_interface[0].addresses[0]
private_key = file("~/.ssh/id_ed25519")
timeout = "2m"
}
}
this is cloud_init.yml
#cloud-config
bootcmd:
- echo "This is a boot command"
runcmd:
- [sh, -xc, "echo $(date) ': hello world!'"]
- sudo apt-get update
- sudo apt-get install qemu-guest-agent -y
- sudo systemctl enable --now qemu-guest-agent.service
ssh_pwauth: true
disable_root: false
users:
- name: root
plain_text_passwd: 'password'
lock_passwd: false
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
groups: users, admin
home: /home/ubuntu
shell: /bin/bash
lock_passwd: false
ssh-authorized-keys:
- ssh-ed25519 Axxx5 [email protected]
network_config.yml
version: 2
ethernets:
ens3:
dhcp4: true
Steps to Reproduce Issue
0.7.6 is doesn't work step:
- terraform init
- TF_LOG=DEBUG terraform apply -auto-approve when use 0.7.6 debug:
</graphics>
<rng model="virtio">
<backend model="random">/dev/urandom</backend>
</rng>
</devices>
</domain>: timestamp="2024-08-14T23:28:10.086+0800"
2024-08-14T23:28:10.435+0800 [INFO] provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:10 [INFO] Domain ID: 1e643687-5914-469d-b5c8-356c5dc65790: timestamp="2024-08-14T23:28:10.435+0800"
2024-08-14T23:28:10.435+0800 [INFO] provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:10 [DEBUG] Waiting for state to become: [all-addresses-obtained]: timestamp="2024-08-14T23:28:10.435+0800"
2024-08-14T23:28:15.441+0800 [INFO] provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:15 [DEBUG] waiting for network address for iface=52:54:00:16:93:28: timestamp="2024-08-14T23:28:15.440+0800"
2024-08-14T23:28:15.441+0800 [INFO] provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:15 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:28:15.441+0800"
2024-08-14T23:28:15.443+0800 [ERROR] provider.terraform-provider-libvirt_v0.7.6: Response contains error diagnostic: diagnostic_severity=ERROR tf_proto_version=5.3 tf_provider_addr=provider @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto tf_req_id=3443beee-8402-aa9f-8e77-364a3bd03a5e tf_resource_type=libvirt_domain tf_rpc=ApplyResourceChange diagnostic_detail=""
diagnostic_summary=
| couldn't retrieve IP address of domain id: 1e643687-5914-469d-b5c8-356c5dc65790. Please check following:
| 1) is the domain running properly?
| 2) has the network interface an IP address?
| 3) Networking issues on your libvirt setup?
| 4) is DHCP enabled on this Domain's network?
| 5) if you use bridge network, the domain should have the pkg qemu-agent installed
| IMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup
0.7.1 is work step:
- just change this:
libvirt = {
source = "dmacvicar/libvirt"
version = "0.7.1"
}
- terraform init -upgrade
- TF_LOG=DEBUG terraform apply -auto-approve
when use 0.7.1 debug:
2024-08-14T23:26:07.000+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] waiting for network address for iface=52:54:00:7E:A5:63: timestamp="2024-08-14T23:26:07.000+0800"
2024-08-14T23:26:07.000+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:26:07.000+0800"
2024-08-14T23:26:07.001+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] Interfaces info obtained with libvirt API:
([]libvirt.DomainInterface) <nil>: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] ifaces with addresses: []: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] 52:54:00:7E:A5:63 doesn't have IP address(es) yet...: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] IP address not found for iface=52:54:00:7E:A5:63: will try in a while: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [TRACE] Waiting 10s before next try: timestamp="2024-08-14T23:26:07.001+0800"
libvirt_domain.domain-ubuntu: Still creating... [40s elapsed]
2024-08-14T23:26:17.010+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] waiting for network address for iface=52:54:00:7E:A5:63: timestamp="2024-08-14T23:26:17.010+0800"
2024-08-14T23:26:17.010+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:26:17.010+0800"
2024-08-14T23:26:17.013+0800 [INFO] provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] Interfaces info obtained with libvirt API:
([]libvirt.DomainInterface) (len=2 cap=2) {
(Include debug logs if possible and relevant).
Additional information:
Do you have SELinux or Apparmor/Firewall enabled? Some special configuration? NO
Hello,
could you try to get an specify wait_For_lease using an image that already has qemu-guest-agent installed? I had successfully get IP address from VM when doing so.
Hello, could you try to get an specify
wait_For_leaseusing an image that already hasqemu-guest-agentinstalled? I had successfully get IP address from VM when doing so.
Thank you for the method you provided I haven't tried to use an image with qemu-guest-agent already installed because I want qemu-guest-agent to be installed automatically during the cloudinit phase, which was possible in previous versions but will not work in the new version
I'll try to take a look and see if I can find anything changed that might cause it between those two versions.
I couldn't find anything particular between those versions. Also, I don't have bridged network in my setup and it's hard for me to create it so I used NAT-ed one and I couldn't reproduce it.
@SJFCS could you check if you can reproduce it in different network types? NAT-ed and routed for example?
EDIT: forget what I wrote, I can reproduce it, just used wrong image before :facepalm:
I'll try to bisect and see where problem lies
Okay, more debugging later: I cannot reproduce it - previously I had problems with cloud-init. I think it might be related to cloud-init itself rather than to provider.
Either way, I have consisten behavior between 0.7.6 and 0.7.1 - it's either failing if qemu-guest-agent is not installed and started or it is running fine otherwise.
I couldn't find anything particular between those versions. Also, I don't have bridged network in my setup and it's hard for me to create it so I used NAT-ed one and I couldn't reproduce it.
@SJFCS could you check if you can reproduce it in different network types? NAT-ed and routed for example?
EDIT: forget what I wrote, I can reproduce it, just used wrong image before 🤦
I'll try to bisect and see where problem lies
The network configuration is the same, I think it has nothing to do with this
Okay, more debugging later: I cannot reproduce it - previously I had problems with cloud-init. I think it might be related to cloud-init itself rather than to provider.
Either way, I have consisten behavior between 0.7.6 and 0.7.1 - it's either failing if qemu-guest-agent is not installed and started or it is running fine otherwise.
Okay, thanks for the troubleshooting, but I did only change the provider version number while keeping the configuration unchanged.
Do you have cloud-init logs for both scenarios?
Do you have cloud-init logs for both scenarios?
I have seen the logs in both cases, and they are normal and no errors are reported.
libv
This issue can be reproduced in versions greater than 0.7.1
│ Error: couldn't retrieve IP address of domain id: 3ac397de-13cd-485d-9772-872f7652de0d. Please check following:
│ 1) is the domain running proplerly?
│ 2) has the network interface an IP address?
│ 3) Networking issues on your libvirt setup?
│ 4) is DHCP enabled on this Domain's network?
│ 5) if you use bridge network, the domain should have the pkg qemu-agent installed
│ IMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup
│ error retrieving interface addresses: error retrieving interface addresses: Virtual machine agent not responding: QEMU host agent not connected
I found that this is not related to whether the network mode is bridge or nat. To simplify the reproduction process and avoid cloudinit interference, I used the Talos ISO boot image below, which includes qemu-guest-agent and can be booted directly as a boot disk.
The metal-amd64.iso (MD5: ebd98e402606991700d8cb5545e72673) can be downloaded from: https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.8.2/metal-amd64.iso
You can also build it yourself here: https://factory.talos.dev -> Bare-metal Machine -> choose version -> amd64 -> choose System Extensions qemu-guest-agent
#=====================================================================================
# Providers
#=====================================================================================
terraform {
required_version = ">= 1.6.0"
required_providers {
libvirt = {
source = "dmacvicar/libvirt"
version = "0.7.4"
}
template = {
source = "hashicorp/template"
version = "2.2.0"
}
}
}
provider "libvirt" {
uri = "qemu:///system"
}
#=====================================================================================
# Libvirt Pool
#=====================================================================================
resource "libvirt_pool" "kubernetes" {
name = "talos"
type = "dir"
path = "/opt/libvirt-pool/talos"
}
#=====================================================================================
# Network
#=====================================================================================
# resource "libvirt_network" "talos" {
# name = "talos"
# mode = "bridge"
# bridge = "br0" # Use the created bridge network card
# autostart = true
# }
resource "libvirt_network" "talos" {
name = "talos"
mode = "nat"
addresses = ["192.168.123.0/24"]
autostart = true
}
#=====================================================================================
# Domain
#=====================================================================================
resource "libvirt_domain" "domain-talos" {
name = "talos"
memory = "2048"
vcpu = 4
cpu {
mode = "host-passthrough"
}
qemu_agent = true
boot_device {
dev = ["cdrom", "hd", "network"]
}
network_interface {
network_id = libvirt_network.talos.id
wait_for_lease = true
}
# cdrom
disk {
file = "/home/admin/Downloads/images/metal-amd64.iso"
}
#=====================================================================================
# Console
#=====================================================================================
console {
type = "pty"
target_port = "0"
target_type = "serial"
}
console {
type = "pty"
target_type = "virtio"
target_port = "1"
}
graphics {
type = "spice"
listen_type = "address"
autoport = true
}
video {
type = "virtio"
}
}
# Output the IP addresses
output "ips" {
value = {
ip = libvirt_domain.domain-talos.network_interface[0].addresses
}
}
Reproduction steps
# set version = "0.7.1"
terraform init
terraform apply -auto-approve
terraform destroy -auto-approve
# it work !
# set version = "0.7.4"
terraform init -upgrade
terraform apply -auto-approve
# it err !
I wanted to have a look at this issue, but it seems I can reproduce it only with version 0.7.4
Versions 0.7.1, 0.7.6 and 0.8.1 are working fine for me. I pretty much copy-pasted your tf file in the previous comment, minus the template provider.
I wanted to have a look at this issue, but it seems I can reproduce it only with version 0.7.4
Versions 0.7.1, 0.7.6 and 0.8.1 are working fine for me. I pretty much copy-pasted your tf file in the previous comment, minus the template provider.
i try it ,on 0.7.6 and 0.8.1 is not working too. ...
After analysis, I discovered the key differences:
- In version 0.7.1, the
domainGetIfacesInfofunction has special logic for error handling:
switch virErr := err.(type) {
case libvirt.Error:
// Agent can be unresponsive if being installed/setup
if addrsrc == uint32(libvirt.DomainInterfaceAddressesSrcLease) && virErr.Code != uint32(libvirt.ErrOperationInvalid) ||
addrsrc == uint32(libvirt.DomainInterfaceAddressesSrcAgent) && virErr.Code != uint32(libvirt.ErrAgentUnresponsive) {
return interfaces, fmt.Errorf("Error retrieving interface addresses: %w", err)
}
}
- In the latest version(all version after 0.7.1), error handling becomes simpler:
if err != nil {
return interfaces, fmt.Errorf("error retrieving interface addresses: %w", err)
}
This is the key to the problem:
-
In version 0.7.1, if an
ErrAgentUnresponsiveerror is encountered when using qemu-agent to obtain an IP address, the code will ignore the error and continue trying, which gives qemu-agent time to start and respond. -
In the new version, any error will be returned directly, including
ErrAgentUnresponsive, which causes qemu-agent to fail before it has fully started and responded.
@NamelessOne91 @scabala @dmacvicar I submitted a PR 1144