dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Cluster Config UI fails when all nodes have labels

Open thatmidwesterncoder opened this issue 2 years ago • 14 comments

Setup

  • Rancher version: 2.7/2.8, tested back on 2.7.0
  • Rancher UI Extensions: n/a
  • Browser type & version: chrome/firefox, tested both

Describe the bug

When creating a cluster via the UI/terraform/etc if all of the machines have labelSelectors on them the UI fails to load and just crashes.

To Reproduce

Apply this terraform:

Terraform block

terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "3.2.0-rc5"
    }
  }
}

provider "rancher2" {
  api_url   = "<REDACTED>"
  token_key = "<REDACTED>"
  insecure  = true
}

resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
  name = "tf-creds-k3s"
  amazonec2_credential_config {
    access_key = "<REDACTED>"
    secret_key = "<REDACTED>"
  }
}

resource "rancher2_machine_config_v2" "rancher2_machine_config_v2" {
  generate_name = "tf-rke2"
  amazonec2_config {
    ami            = ""
    region         = "<REDACTED>"
    security_group = ["<REDACTED>"]
    subnet_id      = "<REDACTED>"
    vpc_id         = "<REDACTED>"
    zone           = "<REDACTED>"
    root_size      = 50
  }
}

resource "rancher2_cluster_v2" "rancher2_cluster_v2" {
  name                                     = "jkeslar-validate13"
  kubernetes_version                       = "v1.26.8+rke2r1"
  enable_network_policy                    = false
  default_cluster_role_for_project_members = "user"
  rke_config {
    machine_selector_config {
      machine_label_selector{
        match_labels = {
          "key" = "value"
        }
      }
    }
    machine_pools {
      name                         = "pool1"
      machine_labels = {
        "key" = "value" 
      }
      cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id
      control_plane_role           = true
      etcd_role                    = true
      worker_role                  = true
      quantity                     = 1
      machine_config {
        kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind
        name = rancher2_machine_config_v2.rancher2_machine_config_v2.name
      }
    }
  }
}

Alternatively, create a cluster in the UI and make sure the machineSelectorConfig yaml looks like this:

    machineSelectorConfig:
      - config:
        machineLabelSelector:
          matchExpressions:
            - key: string
              operator: string
              values:
                - string
          matchLabels:  
            key: string

Create the cluster, wait for it to become active, go to the UI, and click the Config button.

Result Errors in console, fails to load the page.

Expected Result

The cluster config should load.

Screenshots

image image

Additional context

This was found initially during QA testing some terraform changes - the UI fails to handle this specific case where every node has a label selector on it.

The piece of failing code is here: https://github.com/rancher/dashboard/blob/0076a24eff9f489c96c85b14e35d40461dd0a2dd/shell/models/provisioning.cattle.io.cluster.js#L815-L819

It fails on the last function call since no agent node was found without a label.

thatmidwesterncoder avatar Oct 10 '23 17:10 thatmidwesterncoder

Possible connection with SURE-7011 (internal reference) whereby S3 backup configuration produces the error Cannot read properties of undefined (reading ‘length’) though some deciphering of that Terraform setup is still required.

gaktive avatar Nov 08 '23 22:11 gaktive

Another internal reference: SURE-7212

gaktive avatar Nov 16 '23 00:11 gaktive

@thatmidwesterncoder users can add configuration entries in agentConfig field, in RKE2 clusters creation, i.e. protect-kernel-defaults; you can find it as a checkbox at the bottom of Advanced tab (Raise errors if ...):

image

Where those entries should be saved when every node has a label selector ( --> agentConfig is undefined) ? As per documentation, it seems that they could be saved in machineGlobalConfig section: https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration#machineselectorconfig

It would also work from getter perspective, if every node has a label selector , the configs comes from machineGlobalConfig

torchiaf avatar Jan 05 '24 16:01 torchiaf

This is tightly related to other issue. Both could be validated during the same manual validation using terraform and manual RKE2 configuration. cc @yonasberhe23

Related issue comment: https://github.com/rancher/dashboard/issues/10045#issuecomment-1973432313

izaac avatar Apr 03 '24 16:04 izaac

@izaac is this done and closed, according to QA? zube bot appeared to have closed this according to issue history... Just wanted to double-check before updating SURE-7011...

@nwmac if Izaac confirms this as closed, does it close SURE-7011? It's linked to the SURE issue

aalves08 avatar Apr 10 '24 12:04 aalves08

@gaktive @nwmac this is also linked to SURE-6770... Would SURE-6770 be closed as well?

aalves08 avatar Apr 10 '24 12:04 aalves08

also linked to SURE-7212 💦 😅

aalves08 avatar Apr 10 '24 12:04 aalves08

@aalves08 this is on Isabela's to-do items, she's getting issues validated according to priorities so she would take a look as soon as possible. cc @IsaSih

izaac avatar Apr 11 '24 14:04 izaac

@izaac not a priority. I just wanted to double-check since I believe the zube bot has been messing with the issue labels :P

aalves08 avatar Apr 11 '24 14:04 aalves08

@aalves08 once QA has marked this as Done, then yes SURE-6770 & SURE-7212 can be closed.

gaktive avatar Apr 11 '24 22:04 gaktive

IsaSih said: I manually tested this issue ( via UI) and the current behavior corresponds to what is expected.

IsaSih avatar Apr 11 '24 22:04 IsaSih

Taking a look at each of the JIRA tickets mentioned in the comments, I found that SURE-6770 is a different test case, where this property is not present at all.

machineSelectorConfig:
  - config: null

IsaSih avatar Apr 12 '24 01:04 IsaSih

@IsaSih What is the status of this one?

torchiaf avatar Jun 12 '24 14:06 torchiaf

@IsaSih ping - any update - how can eng help unblock this for you?

nwmac avatar Jul 03 '24 15:07 nwmac

Tests pass on v2.9-3ae1c6ac8d3524f8ded4544f1db75fc9e07911a5-head

Image

IsaSih avatar Jul 05 '24 14:07 IsaSih