terraform-provider-proxmox icon indicating copy to clipboard operation
terraform-provider-proxmox copied to clipboard

Wait for VM startup using guest agent `exec` and `cloud-init status --wait`

Open bpg opened this issue 1 year ago • 1 comments

Discussed in https://github.com/bpg/terraform-provider-proxmox/discussions/389

Originally posted by satwell June 25, 2023 The PVE API exposes QEMU Guest Agent's "exec" support, which lets you execute a command on the guest as root. Is there some sensible way that support for this could be added to the provider?

Here's my use case. I'm using cloud-init for initial VM setup, but I'd like Terraform to wait until cloud-init has completed before considering creation complete. Cloud-init conveniently provides a cloud-init status --wait command that will run until cloud-init is done.

I could just use a remote-exec provisioner to connect to the VM and run that command. But that requires setting up an ssh connection to it. Which means the ssh port needs to be reachable, keys are set up, the the username matches what the cloud-init image uses, etc. It would be a lot cleaner if I could just use the guest agent for this.

In an ideal world, the proxmox provider would be able to define a new provisioner for VM resources that would use the guest agent to run commands. But from my quick investigation, it doesn't sound like there's any way in Terraform to create new provisioner types.

Are there other ways that would make sense to add support for the exec API? LIke an additional option in the vm resource type, or a new resource type? Or should I just find or write an external tool that uses the exec API and call it with a local-exec provisioner?

bpg avatar Oct 05 '24 14:10 bpg

I think we can extend the agent block and add something like

agent {
  wait = "cloud-init"
}

Currently the provider implicitly waits on the network interfaces to be available, which may not work reliably for all possible configurations. We might be able to consolidate this checks in one place and configure them through this new wait attribute.

bpg avatar Oct 05 '24 14:10 bpg

Marking this issue as stale due to inactivity in the past 180 days. This helps us focus on the active issues. If this issue is reproducible with the latest version of the provider, please comment. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

bpg-autobot[bot] avatar Apr 04 '25 00:04 bpg-autobot[bot]

I think this can be easily be solved by setting a large agent timeout, and only enabling the guest-agent service once cloud-init has finished the initialization.

E.g. via a last runcmd or setting SystemD unit-dependencies accordingly.

stv0g avatar Apr 10 '25 08:04 stv0g

Would love to have this feature. At the moment I have to wait to ssh into the server for some undefined amount of time, since I have no idea if cloud-init is done or not.

remote-exec is a no go for me, since cloud-init sets up the ssh certificate so I cannot ssh into the vm till cloud-init is done.

I like the idea of extending the agent block somehow.

binarycodes avatar Sep 12 '25 14:09 binarycodes

I ran into a similar issue and got around it by using the qemu guest agent. In your cloud init installs the guest agent then remote exec on the proxmox host and have it run the cloud init wait command. Here is a snippet of the commands I run on my host to check when an instance is spinning up.

    # Boot order when cloning a vm [BUG](https://github.com/bpg/terraform-provider-proxmox/issues/850)
    "qm set ${proxmox_virtual_environment_vm.instance.id} --boot order=virtio0",
    # wait for cloud init to finish on the instance
    "qm guest exec --timeout 0 ${proxmox_virtual_environment_vm.instance.id} -- cloud-init status --wait > /dev/null",

loganmancuso avatar Sep 16 '25 00:09 loganmancuso

Hm... when qemu agent is enabled for the VM (it's not by default), the provider will wait on the VM startup until IP address is available.

The last two examples here show that either agent is not enabled, or DHCP response is really slow, or perhaps some other issues with IP address reporting. Also, are you running a mixed IP4/6 network? Or IP6 only? 😱

bpg avatar Sep 16 '25 09:09 bpg

I think I found a workaround for my use case atleast.

I enable the qemu-guest-agent as the last thing in the runcmd. systemctl enable --now qemu-guest-agent

So by the time it starts, everything is done already.

If somehow its already set to start up without having to enable it ourself, then, maybe this helps in making it wait for cloud init to complete? ( I didn't need it though)

# /etc/systemd/system/qemu-guest-agent.service.d/override.conf

[Unit]
# Ensure the service starts after cloud-init's final stage
Requires=cloud-final.service
After=cloud-final.service

binarycodes avatar Nov 10 '25 13:11 binarycodes