rancher
rancher copied to clipboard
[BUG] vsphere cluster creation fails when using vapp network protocol profiles
Using the latest rancher version, when creating a new vsphere cluster, it fails upon cloning the vm template to a virtual machine with the following error:
settings:
Network protocol profile is configured as following:
have also tried with empty domain / search but same issue
Error creating machine: Error in driver during machine creation: Invalid network in property guestinfo.dns.domains.
log from the fleet job
I too am having this issue, when I specify the same property in a vApp it is properly populated for example: guestinfo.dns.domains ${searchPath:[my-net-name]} When I power on the VM it is populated correctly. I also tried a few different variations like specifying a custom vApp and now unfortunately I also cannot delete the cluster since it is stuck waiting for a viable init node and there is nothing defined on the VM when I look at the properties.
I just updated to Rancher 2.6.9 and this behavior is still there. I.e. using the option "Use vApp to configure networks with network protocol profiles" leads to the above error ("Invalid network in property guestinfo.dns.domains"). More than that using "Provide a custom vApp config" doesn't work either. All settings are done on the created machine but the just created VM seems to be booted such that vApp configurations are ignored. This seems to be for RKE2 types only. Doing the same for a RKE cluster works without problems.
I'll have to upgrade to 2.6.9 and see if that breaks it for me. I can't quite remember how I got past this but on 2.6.8 I am able to provision RKE2 clusters now. I know one thing that I found was when you switch to custom vApp the first part where is suggests "com.vmware.guestInfo" the font and other places where you use guestinfo make it hard to tell that the "I" in that one entry in "Info" is capital where the rest are lower (or perhaps case insensitive). Double check that and if I am remembering right you should be able to work around the first option.
I digged a little deeper and I think I found the problem. The issue don't seem to be a bug but the different way how nodes are provisioned. The main difference is that during provision of RKE clusters the cloud-init user-data handed over to the VM only contains "groups" and "users". For RKE2 clusters on the other hand the file also contains "write-files" and "runcmd". These two can only appear once during the cloud-init process and thus breaks the VM template being used which also uses these two. As a result the network will not get configured using the vApp options.
I seem to recall something like that from a pre-2.5.x version I had running which I think might be why you have to specify the custom vApp config. Not having done anything too fancy with cloud-init, I wonder if like the Windows version Cloudbase-init if regular cloud-init has a directory where if you put scripts in it they get run? If so you could put the network config shell script I assume you are using there so you don't have to do additional write-files/runcmd.
This issue is actually nothing cloud-init related.
I ran into the same error and noticed that the vApp properties are populated with the fully qualified object path of the network portgroup (ie ${dns:/datacenter/folder/portgroupname}
) instead of just the portgroup name (${dns:portgroupname}
); by changing the vApp options radiobutton to Provide a custom vApp config, will allow you to remove the superfluous parts of the fully qualified object paths.