om `om` CLI will use an incorrect guid when running `configure-director` if two vSphere clusters under an AZ have the same cluster name.

`om` CLI will use an incorrect guid when running `configure-director` if two vSphere clusters under an AZ have the same cluster name.

Open ystros opened this issue 2 years ago • 3 comments

Overview

On a vSphere environment, each AZ can have multiple clusters defined underneath it. The clusters have 3 properties that define its uniqueness - cluster, resource_pool, and host_group. You can have multiple clusters that use the same cluster name, as long as the resource_pool or host_group differs between the two. e.g.

az-configuration:
- name: puff-first-az
  iaas_configuration_name: default
  clusters:
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: ""
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: puff1

The om CLI attempts to add in the guid property for each cluster by using the /api/v0/staged/director/availability_zones Ops Manager API endpoint. This ensures that the payload sent to the update AZ API endpoint is matched up with the existing AZ and cluster definitions. This is necessary because the fields are locked after BOSH + associated products are deployed, and Ops Manager protects against deletions / modifications to the AZs + clusters with an error like:

Cannot modify the cluster 'ops_manager_cluster' in the availability zone 'puff-first-az' of a deployed product

However, the logic om CLI uses to look up the existing cluster only considers the cluster property, which may not be unique within a given AZ: https://github.com/pivotal-cf/om/blob/ca9f0f846ec7510d4a7d638feb709715ccc05834/api/director_service.go#L485-L488

In examples like the above, this will result in om reusing the same guid for two different clusters. The Ops Manager API does not currently prevent this (story to fix here: https://www.pivotaltracker.com/story/show/179348373). Once in this state, any attempts to modify the AZ definition, either in the Ops Manager UI or using the om CLI will result in the previously mentioned 'Cannot modify the cluster ...' error.

Once the API is updated to properly prevent using the same GUID for two different clusters, the om CLI will begin returning an error if this state is reached.

Reproduction steps

Configure Ops Manager using om configure-director --config director-config.yml
Apply Changes
Update director-config.yml to include a new cluster to the AZ that has the same cluster name, but a different resource_pool or host_group property than the original cluster.
Use om configure-director again to update the config in Ops Manager
Use om staged-director-config to get the latest config from Ops Manager. You will see the same guid defined from both clusters.

Workaround

There is no known workaround, other than using different cluster names (which is likely not possible since these are defined at the vSphere layer and would require vSphere configuration changes). Adding guid to the director config YML file does not seem to help, since the code to look up and assign guid always runs as part of the om configure-director command.

Aug 24 '21 21:08 ystros

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.