Rancher server logs are spammed with logs when one of the node of a downstream RKE2 cluster is in `Reconciling` state (UI sends null)
Rancher Server Setup
- Rancher version: 2.7.0 upgraded to 2.7-head (2.7.2-rc6)
- Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE2 custom cluster
- Proxy/Cert Details:
Information about the Cluster
- Kubernetes version:
v1.23.14+rke2r1 to v1.25.7+rke2r1 - Cluster Type (Local/Downstream): Downstream Custom cluster(RKE2)
User Information
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions: Admin
Describe the bug Sometimes when performing an HA Upgrade to Rancher v2.7.0 TO v2.7.2-rc6, while having an AWS RKE2 custom cluster - the cluster will go into Updating state and the etcd nodes will go into a Reconciling state and the INFO logs are spammed and keep coming the same message.
To Reproduce
- Create a HA Rancher serve on v2.7.0
- Provision a downstream AWS RKE2 custom cluster(hardened) with k8 version : v1.23.14+rke2r1
- Perform an HA Upgrade on Rancher v2.7-head (v2.7.2-rc6)
- Make sure the upgrade success and downstream cluster came back active
- Upgrade downstream cluster kubarnetes version to to v1.25.7+rke2r1 and SAVE
- Check for the system behavior, Logs and cluster state
Result a. Downstream cluster initially came back active and then it went to update state with an error and it never came back to Active b. Rancher logs are spammed with INFO, these logs are keep coming for every second c. One of the node (etcd) is in Reconciling state and never came back with message on UI (Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet)
Expected Result a. Rancher logs should not be spammed with the logs every second.
** LOGS**
2023/03/17 18:36:58 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:36:58 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:03 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:03 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:08 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:08 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:13 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:13 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:18 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:18 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:23 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:23 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:28 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:28 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:33 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:33 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:38 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:38 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:43 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:43 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:48 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:48 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:51 [ERROR] error syncing 'fleet-default/custom-5e4c4afa8ddd': handler unmanaged-machine: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes ip-172-31-2-93), requeuing
This is a UI bug.
The root cause of the problem is the fact that we are delivering an invalid configuration to RKE2, namely specifying profile: null in machineGlobalConfig.
This is very easily reproducible -- create a new v2prov cluster in the UI, click "Edit as YAML", and note that profile: null under machineGlobalConfig.
See attached screenshot.
@nwmac @gaktive , this is likely a side-effect of recent changes for profile behavior to accommodate k8s 1.25 changes so possibly a regression we don't want to happen in the upcoming 2.7.2 release. Marking as regression and release-blocker.
@snasovich Backend should ignore the null profile. If the configuration is invalid, backend should reject the request.
/fordwardport v2.7.next2
@snasovich Backend should ignore the null profile. If the configuration is invalid, backend should reject the request.
@nwmac , I agree with this in principle, but it's still a change of the behavior on UI's part that we want to be reverted. I see it's already in test, thank you for a quick turn-around.
https://github.com/rancher/rancher/issues/40942 - backend issue
Setting this ticket to Reopened until a Test Template can be provided.
cc @nwmac
It was determined by management that this ticket does not need a Test Template to complete testing. Therefore I am setting this ticket back to QA Working.
During testing upgrading from v2.7.0 to v2.7-head (Commit ID: 41b2cf3), I did not observe the issue with the spammed logs when following the Steps to Reproduce. Therefore I am closing this ticket as Done.
Test Steps:
- Create a HA Rancher serve on v2.7.0
- Provision a Hardened downstream RKE2 custom cluster with K8s
v1.23.14+rke2r1(depreciated) - Perform an HA Upgrade on Rancher
v2.7-head - Make sure the upgrade success and downstream cluster came back active
- Upgrade downstream cluster Kubernetes version to to
v1.25.7+rke2r1and SAVE - Check for the system behavior, Logs and cluster state
Screenshots:

Rancher Logs: Rancher Logs
Reopen for some more validations.
Ticket #8480 - Test Results - ✅
With Docker on a single-node instance of Rancher v2.7.0 => v2.7.2-rc6:
❗ Unable to reproduce ❗
| Scenario | Test Case | Result |
|---|---|---|
| 1. | Provision downstream rke2 w/ k8s 1.23 on 2.7.0 | ✅ |
| 2. | Upgrade Rancher from 2.7.0 => 2.7.2-rc6, then upgrade cluster's k8s version to 1.25; observe profile field of machineGlobalConfig is left out of request |
✅ |
| 3. | Provision downstream rke2 w/ k8s 1.25 on v2.7.2-rc6 |
✅ |
Scenario 1:
- Fresh install of rancher
v2.7.0 - Provision a downstream rke2 k8s 1.23 cluster
- Verified - cluster successfully provisions; logs not flooded
Scenario 2:
- Pick up where scenario 1 ^^ left off - upgrade rancher to
v2.7.2-rc6 - Once upgraded to
v2.7.2-rc6, upgrade rke2 cluster k8s version to1.25, opening the network tab and observeprofilefield ofmachineGlobalConfigis left out of request (since it is set tonull) - Verified - cluster successfully upgrades; logs not flooded; request accurately matches UI changes
Scenario 3:
- Fresh install of rancher
v2.7.2-rc6 - Provision a downstream rke2 k8s 1.25 cluster
- Verified - cluster successfully provisions; logs not flooded
Re-opening due to the following concerns:
-
profile: nullis still present when navigating toCluster Management > View YAML
The fix that went in, which did resolve the flooded logs symptom the issue reported, seems to have removed profile: null from the request body sent from the UI. The field still persists and is seen in View YAML from rancher UI. This is unexpected. Re-opening to request this field gets hidden behind #
To clarify here, I would expect that I can copy-paste the YAML that comes out of the UI's Edit as YAML (or View as YAML) interface and directly kubectl apply|create it in my cluster and end up with an identical result.
If the UI is selectively ignoring fields when marshalling the object and sending it over the wire, that is not ideal, although it does I suppose provide a resolution to the originally reported bug.
The behavior I would expect to see is
Scenario A: If the user does not define a profile for their cluster and edits the cluster as YAML, there should be no profile key at all in the machineGlobalConfig, or if there is, it should be commented out like we do the rest of the fields that do not have explicitly set values, for example see bucket in the screenshot here: https://github.com/rancher/dashboard/issues/8480#issuecomment-1477180460
Scenario B: If the user defines a profile for their cluster, I would expect to see the profile: cis-1.5 entry under machineGlobalConfig
There is no case in which null is a valid value for profile, but profile is a valid configuration parameter for RKE2. We do not have a mechanism to perform value validation at this point for RKE2/K3s (that is a feature enhancement on the K3s/RKE2 side), and thus on the backend side would have to hard code some type of validation to ensure that null is not being delivered.
This isn't a blocker for 2.7.2, which is now in the 11th hour of its release, and the related backend ticket is also pushed out. Moving to Q2.
The remaining work is to comment out the profile key when editing as yaml - this is currently driven off of data that comes from the backend via the schema.
Bumping out of 2.7.next2 - tis requires for investigation and will require test automation
@Oats87 we should coordinate with you and/or your team about how UI & backend should handle this.
cc @Sahota1225
This may be something the UI is doing when trying to populate fields we expect, so we should scrub upon save.