terraform-provider-proxmox icon indicating copy to clipboard operation
terraform-provider-proxmox copied to clipboard

Adding second NIC with ip_config hangs cloud_init / terraform ( tofu )

Open VGerris opened this issue 1 year ago • 4 comments

Describe the bug When two networks are configured and the second has for example dhcp set, terraform doesn´t finish

To Reproduce Steps to reproduce the behavior:

  1. Create a terraform file with snippet like:
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
it works
When at first run also :
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
  1. Run tofu apply
  2. VM gets created with 2 NIC
  3. Run tofu destroy Add second snippet for 2nd interface:
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }
  1. Run tofu apply
  2. See error - it hangs

Please also provide a minimal Terraform configuration that reproduces the issue.


initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }
    
  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }

and the output of terraform|tofu apply.

VM creating ....

Expected behavior I would expect tofu to continue, even though the IP may not be fetched.

Additional context Add any other context about the problem here.

This may be related to cloud-init: https://forum.proxmox.com/threads/assign-multiple-ip-to-vm-using-cloud-init.116259/

And the other provider may have something similar and a solution: https://github.com/Telmate/terraform-provider-proxmox/issues/1015

Idealy the IP gets given but when this is not possible because of how cloud-init works, it just continuing and showing the issue seems like a good solution

  • Single or clustered Proxmox: clustered
  • Proxmox version: 8.2
  • Provider version (ideally it should be the latest version): latest
  • Terraform/OpenTofu version: 1.8.3
  • OS (where you run Terraform/OpenTofu from): Ubuntu 24.04
  • Debug logs (TF_LOG=DEBUG terraform apply):

VGerris avatar Oct 13 '24 10:10 VGerris

Some additional findings.

It seems the network is configured by netplan and cloud-init puts the configuration in :

/etc/netplan/50-cloud-init.yaml

The snippet : ip_config { ipv4 { address = "dhcp" } }

results in something like: network: version: 2 ethernets: eth0: match: macaddress: "bc:24:11:c8:de:82" dhcp4: true

When the second snippet is added in the main.tf file as described above, the cloud-init file gets the correct info in it for the second network and everything works fine after boot.

So the problem only occurs at first creation, not when adding it later. That leads me to believe something in the code is not prepared to handle multiple network configs. If I set debugging on, one of the last calls regarding networking seems:

https://github.com/bpg/terraform-provider-proxmox/blob/main/proxmoxtf/provider/provider.go#L251

I'm suspecting it may be there where the issue starts. I am not familiar with Go, so I will see how far I get.

Did anyone else see and have this or better yet, can someone with Go knowledge see if the issue may start there ?

I could start by looking at what the nodeAddress is, can someone point to instructions on how to deploy the provider with updated code ? Thank you

VGerris avatar Oct 13 '24 21:10 VGerris

I have been investigating further. Since the creation with terraform never finished and the settings I put in cloud-init did not give me access to the VM, I modified an image to have a root account so I can login at creation time. Investigating the machine learned that the netplan config looked good, but somehow the network is not set properly to reach the internet, even though both NICs get a DHCP address ( from different servers ).

The terraform process is actually waiting for qemu-tools to reply. When I fix the network by using dhcpcd and install the package and start it, terraform continues and all looks good and as expected.

This seems to indicate that cloud-init somehow is not able to get routing proper when using 2 NICS but also that if that can be fixed in the cloud init script, it may be solvable. The best solution would be to be able to find why cloud-init has an issue completing properly and perhaps even fix it there but as linked above, some people say that it is not supposed to provide access for automation. I tend to disagree because my automation may be run from another net and the VM needs the internet ( which is what I have now and why I encountered this behavior ).

So far I have tried netplan apply and to add ipv6 = false without consistent success. It would be great if anyone can help finding the network cause of this, then a possible workaround would be include the proper commands in the cloud-init script.

Another workaround I used before is to get the 2nd interface from terraform and then run Ansible to run dhcpcd on the interface, but that doesn´t 'stick' either. In that case I get the NIC like this :

output "vm_nic_2_name" {
  value = proxmox_virtual_environment_vm.ubuntu_vm.network_interface_names[2]
}

ad then in script that runs Ansible : sed "s/nic1_replace/$(tofu -chdir=$BASEDIR/terraform-proxmox output vm_nic_2_name | sed -nr 's|.*"(.*)".*|\1|p')/g" inventory_template1.yml > inventory.yml

which sets an Ansbile var that is used like :

    - name: Run dhcpcd on second NIC
      ansible.builtin.command: dhcpcd {{ nic_1 }}
      register: nic

I am gonna look a bit further into the best way to have the network configured properly and post, in the mean while, help and tips are appreciated :)

VGerris avatar Oct 14 '24 21:10 VGerris

Based on info on netplan and some reading I found an acceptable work around.

In the cloud-config snippet, write to a file with netplan config:

    write_files:
      - path: /etc/netplan/99-network-config.yaml
        permissions: "0600"
        owner: root
        content: |
          network:
            version: 2
            ethernets:
              ens19:
                dhcp4: true
                match:
                  name: "ens19"
                mtu: 1500
                set-name: "eth1"

Then at the top of runcmd add:

    runcmd:
        - netplan apply
        - .....

Creation takes a bit longer and for some reason the apt update command too, but this configures both interfaces the same as with the double snippet, but with working internet and thus qemu-tools.

Perhaps good to add this to docs. That's the best I can do for now, without spending tons of more time that is scarce currently :).

This relies on the name of the interface, I am not aware of a way to get the mac or name before so it can used dynamically, but it's good enough for me.

Any improvements are welcome and I can make a PR for the docs if that's appreciated. Thank you all for maintaining this terraform provider, it is pretty awesome!

VGerris avatar Oct 14 '24 23:10 VGerris

turns out there is something more needed because a route is added by default. there is an option to skip that:

                dhcp4-overrides:
                  use-routes: false

Now when I use a snippet like:

    ip_config {
      ipv4 {
        address = "192.168.56.20/24"
        gateway = "192.168.56.1"
      }
    }

I also get a route set as default and as a consequence the same problem as with 2 dhcp snippets. In this case the workaround is a bit simpler, to remove that route before anything: runcmd: - ip r del default via 192.168.56.1 - apt update

If the use-routes: false option can be made part of the resource: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_vm#ip_config it may well be a solution for this behavior, by simply setting that option in the ip_config snippet.

As the documentation says, and probably better is to omit the gateway, then it does not add a route and everything works as expected.

VGerris avatar Oct 15 '24 11:10 VGerris

Marking this issue as stale due to inactivity in the past 180 days. This helps us focus on the active issues. If this issue is reproducible with the latest version of the provider, please comment. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

bpg-autobot[bot] avatar Apr 14 '25 00:04 bpg-autobot[bot]

I am having the same issuewith the latest provider version

FlyinPancake avatar May 09 '25 00:05 FlyinPancake

Hey @FlyinPancake 👋🏼

Are you working on adding a DHCP interface? Could you share your config and the steps you've taken? "Me too" comments on open issues don’t provide much context or help with troubleshooting.

Thanks!

bpg avatar May 09 '25 12:05 bpg

Marking this issue as stale due to inactivity in the past 180 days. This helps us focus on the active issues. If this issue is reproducible with the latest version of the provider, please comment. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

bpg-autobot[bot] avatar Nov 06 '25 00:11 bpg-autobot[bot]