om icon indicating copy to clipboard operation
om copied to clipboard

`om` CLI will use an incorrect guid when running `configure-director` if two vSphere clusters under an AZ have the same cluster name.

Open ystros opened this issue 2 years ago • 3 comments

Overview

On a vSphere environment, each AZ can have multiple clusters defined underneath it. The clusters have 3 properties that define its uniqueness - cluster, resource_pool, and host_group. You can have multiple clusters that use the same cluster name, as long as the resource_pool or host_group differs between the two. e.g.

az-configuration:
- name: puff-first-az
  iaas_configuration_name: default
  clusters:
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: ""
  - cluster: ops_manager_cluster
    drs_rule: MUST
    host_group: ""
    resource_pool: puff1

The om CLI attempts to add in the guid property for each cluster by using the /api/v0/staged/director/availability_zones Ops Manager API endpoint. This ensures that the payload sent to the update AZ API endpoint is matched up with the existing AZ and cluster definitions. This is necessary because the fields are locked after BOSH + associated products are deployed, and Ops Manager protects against deletions / modifications to the AZs + clusters with an error like:

Cannot modify the cluster 'ops_manager_cluster' in the availability zone 'puff-first-az' of a deployed product

However, the logic om CLI uses to look up the existing cluster only considers the cluster property, which may not be unique within a given AZ: https://github.com/pivotal-cf/om/blob/ca9f0f846ec7510d4a7d638feb709715ccc05834/api/director_service.go#L485-L488

In examples like the above, this will result in om reusing the same guid for two different clusters. The Ops Manager API does not currently prevent this (story to fix here: https://www.pivotaltracker.com/story/show/179348373). Once in this state, any attempts to modify the AZ definition, either in the Ops Manager UI or using the om CLI will result in the previously mentioned 'Cannot modify the cluster ...' error.

Once the API is updated to properly prevent using the same GUID for two different clusters, the om CLI will begin returning an error if this state is reached.

Reproduction steps

  1. Configure Ops Manager using om configure-director --config director-config.yml
  2. Apply Changes
  3. Update director-config.yml to include a new cluster to the AZ that has the same cluster name, but a different resource_pool or host_group property than the original cluster.
  4. Use om configure-director again to update the config in Ops Manager
  5. Use om staged-director-config to get the latest config from Ops Manager. You will see the same guid defined from both clusters.

Workaround

There is no known workaround, other than using different cluster names (which is likely not possible since these are defined at the vSphere layer and would require vSphere configuration changes). Adding guid to the director config YML file does not seem to help, since the code to look up and assign guid always runs as part of the om configure-director command.

ystros avatar Aug 24 '21 21:08 ystros

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Aug 24 '21 21:08 cf-gitbot

HI @ystros

There is an existing PR #559 with the change, could you be able to check it out if it fixes your problem?

jaristiz avatar Sep 08 '21 19:09 jaristiz

@ystros Hey Brian,

Did #559 work to resolve this issue for you?

dtimm avatar Jun 13 '22 17:06 dtimm