Cluster Config UI fails when all nodes have labels
Setup
- Rancher version: 2.7/2.8, tested back on 2.7.0
- Rancher UI Extensions: n/a
- Browser type & version: chrome/firefox, tested both
Describe the bug
When creating a cluster via the UI/terraform/etc if all of the machines have labelSelectors on them the UI fails to load and just crashes.
To Reproduce
Apply this terraform:
Terraform block
terraform {
required_providers {
rancher2 = {
source = "terraform.local/local/rancher2"
version = "3.2.0-rc5"
}
}
}
provider "rancher2" {
api_url = "<REDACTED>"
token_key = "<REDACTED>"
insecure = true
}
resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
name = "tf-creds-k3s"
amazonec2_credential_config {
access_key = "<REDACTED>"
secret_key = "<REDACTED>"
}
}
resource "rancher2_machine_config_v2" "rancher2_machine_config_v2" {
generate_name = "tf-rke2"
amazonec2_config {
ami = ""
region = "<REDACTED>"
security_group = ["<REDACTED>"]
subnet_id = "<REDACTED>"
vpc_id = "<REDACTED>"
zone = "<REDACTED>"
root_size = 50
}
}
resource "rancher2_cluster_v2" "rancher2_cluster_v2" {
name = "jkeslar-validate13"
kubernetes_version = "v1.26.8+rke2r1"
enable_network_policy = false
default_cluster_role_for_project_members = "user"
rke_config {
machine_selector_config {
machine_label_selector{
match_labels = {
"key" = "value"
}
}
}
machine_pools {
name = "pool1"
machine_labels = {
"key" = "value"
}
cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id
control_plane_role = true
etcd_role = true
worker_role = true
quantity = 1
machine_config {
kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind
name = rancher2_machine_config_v2.rancher2_machine_config_v2.name
}
}
}
}
Alternatively, create a cluster in the UI and make sure the machineSelectorConfig yaml looks like this:
machineSelectorConfig:
- config:
machineLabelSelector:
matchExpressions:
- key: string
operator: string
values:
- string
matchLabels:
key: string
Create the cluster, wait for it to become active, go to the UI, and click the Config button.
Result Errors in console, fails to load the page.
Expected Result
The cluster config should load.
Screenshots
Additional context
This was found initially during QA testing some terraform changes - the UI fails to handle this specific case where every node has a label selector on it.
The piece of failing code is here: https://github.com/rancher/dashboard/blob/0076a24eff9f489c96c85b14e35d40461dd0a2dd/shell/models/provisioning.cattle.io.cluster.js#L815-L819
It fails on the last function call since no agent node was found without a label.
Possible connection with SURE-7011 (internal reference) whereby S3 backup configuration produces the error Cannot read properties of undefined (reading ‘length’) though some deciphering of that Terraform setup is still required.
Another internal reference: SURE-7212
@thatmidwesterncoder users can add configuration entries in agentConfig field, in RKE2 clusters creation, i.e. protect-kernel-defaults; you can find it as a checkbox at the bottom of Advanced tab (Raise errors if ...):
Where those entries should be saved when every node has a label selector ( --> agentConfig is undefined) ?
As per documentation, it seems that they could be saved in machineGlobalConfig section:
https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration#machineselectorconfig
It would also work from getter perspective, if every node has a label selector , the configs comes from machineGlobalConfig
This is tightly related to other issue. Both could be validated during the same manual validation using terraform and manual RKE2 configuration. cc @yonasberhe23
Related issue comment: https://github.com/rancher/dashboard/issues/10045#issuecomment-1973432313
@izaac is this done and closed, according to QA? zube bot appeared to have closed this according to issue history... Just wanted to double-check before updating SURE-7011...
@nwmac if Izaac confirms this as closed, does it close SURE-7011? It's linked to the SURE issue
@gaktive @nwmac this is also linked to SURE-6770... Would SURE-6770 be closed as well?
also linked to SURE-7212 💦 😅
@aalves08 this is on Isabela's to-do items, she's getting issues validated according to priorities so she would take a look as soon as possible. cc @IsaSih
@izaac not a priority. I just wanted to double-check since I believe the zube bot has been messing with the issue labels :P
@aalves08 once QA has marked this as Done, then yes SURE-6770 & SURE-7212 can be closed.
IsaSih said: I manually tested this issue ( via UI) and the current behavior corresponds to what is expected.
Taking a look at each of the JIRA tickets mentioned in the comments, I found that SURE-6770 is a different test case, where this property is not present at all.
machineSelectorConfig:
- config: null
@IsaSih What is the status of this one?
@IsaSih ping - any update - how can eng help unblock this for you?
Tests pass on v2.9-3ae1c6ac8d3524f8ded4544f1db75fc9e07911a5-head