packer-plugin-vsphere icon indicating copy to clipboard operation
packer-plugin-vsphere copied to clipboard

Add support for using unique identifiers to select a network connection in environments where names can be ambiguous.

Open taylor-madeak opened this issue 3 years ago • 25 comments
trafficstars

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request. If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

NSX allows for creating port groups with the same name, even on the same virtual distributed switch. This plugin has a long history of issues with trying to create VMs on a vSphere cluster with such virtual distributed port groups managed by NSX overlays. VMware resolved this in govmomi 0.27 by allowing finder to use other unique identifiers to select a network:

// Network finds a NetworkReference using a Name, Inventory Path, ManagedObject ID, Logical Switch UUID or Segment ID.
// With standard vSphere networking, Portgroups cannot have the same name within the same network folder.
// With NSX, Portgroups can have the same name, even within the same Switch. In this case, using an inventory path
// results in a MultipleFoundError. A MOID, switch UUID or segment ID can be used instead, as both are unique.
// See also: https://kb.vmware.com/s/article/79872#Duplicate_names
// Examples:
// - Name:                "dvpg-1"
// - Inventory Path:      "vds-1/dvpg-1"
// - ManagedObject ID:    "DistributedVirtualPortgroup:dvportgroup-53"
// - Logical Switch UUID: "da2a59b8-2450-4cb2-b5cc-79c4c1d2144c"
// - Segment ID:          "/infra/segments/vnet_ce50e69b-1784-4a14-9206-ffd7f1f146f7"

To leverage this, I request the following:

  1. Update the minimum version of govmomi used by this plugin to at least 0.27.
  2. Possibly write additional network finder logic to allow users to use one or more of the other unique identifiers for the network, in addition to (or in place of) the network name or inventory path. Segment ID and UUID seem like a reasonable choices here, as they are both fairly readily available from the vCenter UI.

I'm not a GoLang developer, so I'm probably not a great judge of how heavy a lift this would be, but it appears that this may be as simple as just changing the version of govmomi this plugin is built with. The plugin itself looks like it just passes the context and argument straight through to govmomi.Finder.

Use Case(s)

Allow builder to be used in large vSphere environments that provide networking with NSX for scalability and mobility between clusters.

taylor-madeak avatar Nov 16 '22 22:11 taylor-madeak

Note The latest release of vmware/govmomi is 0.29.0.

tenthirtyam avatar Nov 16 '22 23:11 tenthirtyam

Note The aforementioned enhancement was released in vmware/govmomi v0.27.1 which included https://github.com/vmware/govmomi/commit/6209be5b5c0bd5d81078fdc82eb4001f202f90e7.

tenthirtyam avatar Nov 17 '22 00:11 tenthirtyam

FWIW: I can't seem to hit on the right set of arguments to make any of these alternate unique identifiers work with govc either. My attempts to test by building this plugin using the latest version of govmomi have also failed (though, admittedly I don't really know what I'm doing when it comes to GoLang).

taylor-madeak avatar Nov 23 '22 19:11 taylor-madeak

At a minimum, you'll need to download the source, and from the source tree run:

go get http://github.com/vmware/govmomi
go mod tidy
go build

Then copy the binary to your packer.d/plugins and then run your tests.

I've spoken with the maintainer about us updating to v0.29.0. Ideally, this dependency should generally be done as an isolated chore(deps) pull request.

tenthirtyam avatar Nov 23 '22 21:11 tenthirtyam

PR #240 for vmware/[email protected].

tenthirtyam avatar Nov 28 '22 17:11 tenthirtyam

Note hashicorp/[email protected] is now released and includes vmware/[email protected].

tenthirtyam avatar Dec 07 '22 01:12 tenthirtyam

@tenthirtyam I saw that earlier today. Unfortunately, this plugin still doesn't seem to be able to find a network by a unique identifier other than its name.

Using the Segment ID:

2022/12/07 02:24:03 [INFO] (telemetry) Starting builder vsphere-iso.linux
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No URLs were provided to Step Download. Continuing...
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No CD files specified. CD disk will not be made.
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No URLs were provided to Step Download. Continuing...
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No CD files specified. CD disk will not be made.
2022/12/07 02:24:03 ui: ESC[1;32m==> vsphere-iso.linux: Creating VM...ESC[0m
2022/12/07 02:24:03 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:24:03 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 432 milliseconds 468 microseconds: error creating vm: network '/infra/segments/977eab1d-1670-4b4e-9072-f71038385359' not foundESC[0m
2022/12/07 02:24:03 ui:
==> Wait completed after 435 milliseconds 270 microseconds

Using the MOID:

2022/12/07 02:38:58 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:38:58 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 350 milliseconds 544 microseconds: error creating vm: network 'DistributedVirtualPortgroup:dvportgroup-16495' not foundESC[0m

I imagine the issue is somewhere in here:

https://github.com/hashicorp/packer-plugin-vsphere/blob/a0992c7396605b33492e7b9447569110e1bb7033/builder/vsphere/driver/network.go#L23-L47

Does this plugin need to pass some additional information to govmomi Finder.Network for this to work correctly?

What kind of additional information can I provide that will help running this down?

taylor-madeak avatar Dec 07 '22 02:12 taylor-madeak

Based on a quick review it looks like the plugin should call finder.networkByID based on the network input of an ID vs name.

https://github.com/vmware/govmomi/blob/d99e99542ffe1e054b2da68fac48ee5ce2bd4987/find/finder.go#L823-L856

tenthirtyam avatar Dec 07 '22 02:12 tenthirtyam

It looks to me like the finder.Network method already falls back to calling the finder.networkByID method:

https://github.com/vmware/govmomi/blob/17e669d84193839acdbebe6aed5aea26b1c65d48/find/finder.go#L804-L821

This raises some additional questions:

  • Why isn't this working in my case?
  • How can this project test it?
  • Is this an issue with this plugin, or the underlying govmomi library?

That last question comes up because I can't get the search to work with govc either.

taylor-madeak avatar Dec 07 '22 19:12 taylor-madeak

It may be a good idea to open a GitHub Discussion item on vmware/govmomi if it appears to also be an upstream concern. It can be converted to an issue if it is a bug.

Note

I pinged one of the vmware/govmomi who has kindly commented below. 👇

tenthirtyam avatar Dec 07 '22 19:12 tenthirtyam

Are you able to find the network with govc using:

% govc find / -type g -config.segmentId /infra/segments/seg_6e9bdde0-f9bf-4ee6-ac36-493627b6db32_0
/folder-WCP_DC/WCP_DC/network/seg-domain-c9:a97676f3-cf6d-42d7-875b-ae0bd0016e32-test-gc-e2e-demo-ns-0

If so and you add the -i flag, it will print the ManagedObject ID:

% govc find -i / -type g -config.segmentId /infra/segments/seg_6e9bdde0-f9bf-4ee6-ac36-493627b6db32_0
DistributedVirtualPortgroup:dvportgroup-71

Does using the MOID work with the plugin?

dougm avatar Dec 08 '22 00:12 dougm

@dougm this query has the same issue as searching by name, that is to say it returns multiple results.

govc find / -type g -config.segmentId /infra/segments/b8f015a1-c281-4dfd-abbc-df0c88c5b2a4
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29

With the -i flag, we can see that these each have different MOID values:

govc find -i / -type g -config.segmentId /infra/segments/b8f015a1-c28
1-4dfd-abbc-df0c88c5b2a4
DistributedVirtualPortgroup:dvportgroup-16348
DistributedVirtualPortgroup:dvportgroup-8278
DistributedVirtualPortgroup:dvportgroup-16476

taylor-madeak avatar Dec 08 '22 02:12 taylor-madeak

My understanding based on the KB was that segmentId is unique, this is the first I've seen where it isn't. I wonder what is unique (other than moid), can take a look if you can share the output of:

% govc find -i / -type g -config.segmentId /infra/segments/b8f015a1-c281-4dfd-abbc-df0c88c5b2a4 | xargs -n1 govc object.collect -o -json

The error message in this comment is "not found":

network '/infra/segments/977eab1d-1670-4b4e-9072-f71038385359' not found

Based on your govc output, I'd expect the error to be "multiple" found. So I also wonder if the plugin here has govmomi w/ the networkByID fallback. You should be able to confirm but using one of the moid's (e.g. DistributedVirtualPortgroup:dvportgroup-16348)

dougm avatar Dec 08 '22 04:12 dougm

The error message observed in the previous comment when using MOID DistributedVirtualPortgroup:dvportgroup-16348 was also "not found":

2022/12/07 02:38:58 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:38:58 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 350 milliseconds 544 microseconds: error creating vm: network 'DistributedVirtualPortgroup:dvportgroup-16495' not foundESC[0m

tenthirtyam avatar Dec 08 '22 04:12 tenthirtyam

I may be incorrect, but it might be because addNetwork is using findNetwork - which in turn calls FindNetworks that uses NetworkList

https://github.com/hashicorp/packer-plugin-vsphere/blob/324d9eb8b74d778bc3a97f9aff3931d69f5ab604/builder/vsphere/driver/vm.go#L948-L953

https://github.com/hashicorp/packer-plugin-vsphere/blob/324d9eb8b74d778bc3a97f9aff3931d69f5ab604/builder/vsphere/driver/vm.go#L977-L987

https://github.com/hashicorp/packer-plugin-vsphere/blob/324d9eb8b74d778bc3a97f9aff3931d69f5ab604/builder/vsphere/driver/network.go#L34-L47

tenthirtyam avatar Dec 08 '22 04:12 tenthirtyam

I may be incorrect, but it might be because addNetwork is using findNetwork - which in turn calls FindNetworks that uses NetworkList

Yes, looks like that is the issue. We can change govmomi's NetworkList to do the networkByID fallback. Or the plugin could fallback to calling Network if list fails.

dougm avatar Dec 08 '22 14:12 dougm

Thanks Doug - appreciate the assist here. I'll work with the maintainer and get a fix in for this in the plugin to use the networkByID fallback.

tenthirtyam avatar Dec 08 '22 14:12 tenthirtyam

I'm setup to test new plugin builds, if you guys can get me some PoC code.

taylor-madeak avatar Dec 08 '22 23:12 taylor-madeak

I take it this is still backlogged ?

StephenDunne-CAL avatar Jun 06 '23 11:06 StephenDunne-CAL

I revisited this one this evening and did some tests on latest (v1.2.7) and didn't have any issues using the MOIDs for port groups (e.g. "Network:network-18085" or distributed port groups (e.g. "DistributedVirtualPortgroup:dvportgroup-22077") both of which had the same name and would error if just the name was used.

==> vsphere-iso.linux-photon: error creating virtual machine: path 'DHCP' resolves to multiple networks. please provide a host to match or the network full path

When using the MOIDs, the build is placed on the correct port group or distributed port groups without issue. I've not verified this with an NSX segment yet, but it should have the same results.

I was going to add the failback, as seen below, but it appears not to be needed...

func (d *VCenterDriver) FindNetworks(name string) ([]*Network, error) {
    ns, err := d.finder.NetworkList(d.ctx, name)
    if err != nil || len(ns) == 0 {
        n, err := d.finder.Network(d.ctx, name)
        if err != nil {
            return nil, err
        }
        return []*Network{
            {
                network: n,
                driver:  d,
            },
        }, nil
    }
    var networks []*Network
    for _, n := range ns {
        networks = append(networks, &Network{
            network: n,
            driver:  d,
        })
    }
    return networks, nil
}

Why, because of https://github.com/vmware/govmomi/pull/2626 (@dougm is awesome! 🎉 ) added the failback (see https://github.com/vmware/govmomi/pull/2626/commits/bb4f739b451eefa1261f5c20df1ec7dc14621e8c#) that was included in v0.31.0 of vmware/govmomi and was picked up in v1.2.3 of the plugin.

I'm going to close this issue, however, I will add a PR to update the duplicate networks error message to instead suggest using the ID or path of the network instead of only "a host to match or full path".

Ryan

tenthirtyam avatar Apr 26 '24 02:04 tenthirtyam

@tenthirtyam I'd feel a lot better about this if it was tested with a NSX segment before closing this. I'll see if I can get a test in later today or on Monday.

To clarify: I'm @taylor-madeak, just created a separate GitHub account for work stuff (which this issue relates to).

rtaylor-gci avatar Apr 26 '24 17:04 rtaylor-gci

I've successfully tested this with the both the segment id and logical switch uuid using release v1.2.7 on VMware Cloud Foundation 5.1.1 BOM.

Ryan Johnson Distinguished Engineer, VMware by Broadcom

tenthirtyam avatar Apr 26 '24 18:04 tenthirtyam

@tenthirtyam I'm still having some trouble getting a successful test for this in our VCF environment, where I'm not guaranteed to land on any one specific VM host in the cluster. Can you share which vsphere-iso source properties you're specifying when you test this feature? I'd like to verify that it's not just a template configuration issue on my part.

rtaylor-gci avatar May 02 '24 18:05 rtaylor-gci

Is your use case to always use the same host and a specific network on that host?

tenthirtyam avatar May 02 '24 19:05 tenthirtyam

The opposite, actually. My current template specifies server, datacenter, and cluster. I'd like to continue not caring which host I end up on and still be able to get a network. I'm not an expert with NSX, but it appears that the overlays end up being associated with VM hosts in vCenter. So, by not specifying a host to build on, the distributed portgroup MOID or segment ID I specify isn't found by Packer.

rtaylor-gci avatar May 02 '24 20:05 rtaylor-gci

Hey! If you'd like to take a look at this live let me know. You can email me [email protected] and we can schedule some time to look at this.

Ryan Johnson VMware by Broadcom

tenthirtyam avatar May 21 '24 12:05 tenthirtyam

@taylor-madeak - wanted to check in and see if you've had an opportunity to test with the latest. Please feel free to reach out at ryan.johnson [at] broadcom [dot] com if you would like to look at this live.

Ryan

tenthirtyam avatar Jun 18 '24 03:06 tenthirtyam