rancher
rancher copied to clipboard
Rancher reporting cpu/mem reserved and pod count wrong
Rancher Server Setup
- Rancher version: 2.6.3
- Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RK2
- Proxy/Cert Details:
Information about the Cluster
- Kubernetes version: v1.21.5+rke2r1 / v1.21.7+rke2r2
- Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Imported RKE2
Describe the bug I have three more or less empty clusters deployed with RKE2, only one of them seems to be reporting a correct value for reserved cpu/mem. And pods seem to be missing too, but only in the rancher home screen...

As you can see rke2-downstream2 cluster is reporting no reservations at all and test-rk2 seems to report too much reservations for an empty cluster
This is the output of the resource-capacity plugin if krew: rke2-downstream2:
❯ kubectl resource-capacity
NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
* 8020m (44%) 4120m (22%) 2387Mi (3%) 5323Mi (7%)
rke2-downstream2-agent-1 900m (22%) 1300m (32%) 284Mi (1%) 630Mi (3%)
rke2-downstream2-agent-2 1850m (46%) 1700m (42%) 1341Mi (7%) 4119Mi (24%)
rke2-downstream2-agent-3 820m (20%) 420m (10%) 252Mi (1%) 284Mi (1%)
rke2-downstream2-server-1 1450m (72%) 200m (10%) 126Mi (1%) 53Mi (0%)
rke2-downstream2-server-2 1450m (72%) 200m (10%) 126Mi (1%) 53Mi (0%)
rke2-downstream2-server-3 1550m (77%) 300m (15%) 261Mi (3%) 187Mi (2%)
and test-rke2:
❯ kubectl resource-capacity
NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
* 6070m (33%) 220m (1%) 856Mi (1%) 290Mi (0%)
rke2-agent-node-1 600m (15%) 0Mi (0%) 95Mi (0%) 0Mi (0%)
rke2-agent-node-2 600m (15%) 0Mi (0%) 95Mi (0%) 0Mi (0%)
rke2-agent-node-3 600m (15%) 0Mi (0%) 95Mi (0%) 0Mi (0%)
rke2-server-node-1 1570m (78%) 220m (11%) 384Mi (4%) 290Mi (3%)
rke2-server-node-2 1350m (67%) 0Mi (0%) 95Mi (1%) 0Mi (0%)
rke2-server-node-3 1350m (67%) 0Mi (0%) 95Mi (1%) 0Mi (0%)
All server nodes are 2 CPU and agent nodes 4 CPU vms btw.
rke2-downstream2 has monitoring installed, test-rke2 has not


btw the issue in the rke2-downstream2 cluster existed even before i tried the upgrade to v1.21.7+rke2r2 via rancher ui
Same issue with an aks cluster on 1.21.7 and Rancher 2.6.3 Other cluster deployed with rke also on 1.21.7 are reporting correctly
https://rancher-addreess.domain.com/v1/management.cattle.io.cluster is returning all zeroes for the clusters in question so its not an display issue..
Deployed another RKE2 cluster and imported it... same result
Recreated the RKE2 cluster which was reporting correct values (terraform) and now it is not reporting any CPU/MEM/Pod values
It could be that my two working clusters were imported when Rancher was still on 2.6.2
while all the other cluster were imported after the update to 2.6.3
,but im not 100% sure
Same issue with RKE1 for me.
Working stats built with
2.6.2
Non-working stats built with 2.6.3
Seems to be getting weirder... my recreated cluster now started to report values... that seem a little off :)

And i just compared the Pod count of all clusters that are reporting values... none of them are correct
Are those numbers filtered or averaged ? or excluding some "system" pods ?
So there are my numbers in the home screen
And this is in the cluster itself


Hi,
We have the same trouble :
Our version is Rancher 2.6.3, we don't have the problem with 2.6.2. The monitoring is installed on Downstream and not on Local.
Same issue.
We have the same problem. Reservations and Limits are only shown for the local cluster I suspect this change to be the culprit: https://github.com/rancher/rancher/commit/3453a429bf4107dde095dfcf0256daf93ec6ffb3
This utilizes annotations on the v1/Node to detect current limits and reservations instead of calculating it based on the pods. These annotations do actually exist however they don't seem to get correctly synced to the "management.cattle.io/Node" resource.
apiVersion: v1
kind: Node
metadata:
name: master-server1
annotations:
...
management.cattle.io/pod-limits: '{"cpu":"300m","memory":"178Mi"}'
management.cattle.io/pod-requests: '{"cpu":"1725m","memory":"2211Mi","pods":"19"}'
....
In my case the limits and requested from this resource is only available for the local cluster:
apiVersion: management.cattle.io/v3
kind: Node
metadata:
name: machine-laqe1
namespace: c-m-1r131swx
...
limits:
cpu: 120m
memory: 148Mi
requested:
cpu: 745m
memory: 243Mi
pods: '21'
I have a similar issue, except it's not showing 0%, nor is the effect anywhere but in the cluster that was upgraded to 1.22 (vs the others 1.21). Lens-IDE shows all the values correctly.
Rancher:

vs Lens:

Might be same as or related to https://github.com/rancher/rancher/issues/36229
This also happens with the AKS provider. On the image, clusters that show counts had been imported a while ago, while all newly imported with kubernetes v1.21.7 show wrong counts.
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
remove-stale
I guess the commit 3453a429bf4107dde095dfcf0256daf93ec6ffb3 introduced the sync issue, remove annotate management.cattle.io/nodesyncer will trigger a force sync per the nodessyncer logic
CLUSTER_ID="the cluster id"
for machine in $(kubectl get nodes.management.cattle.io -n $CLUSTER_ID -o name --no-headers);do
kubectl annotate -n $CLUSTER_ID $machine management.cattle.io/nodesyncer-
done
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
We see the same issue. memory requests are not shown.
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
In our case right now the memory might be right but the "max memory" is wrong. We have more then 23GB of ram.
In our case right now the memory might be right but the "max memory" is wrong. We have more then 23GB of ram.
Same here
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
please reopen
I got the same issue with k3s --version k3s version v1.27.7+k3s2 (575bce76) go version go1.20.10 Can we reopen the bug ? and how to fix ?