rancher Rancher reporting cpu/mem reserved and pod count wrong

Rancher Server Setup

Rancher version: 2.6.3
Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RK2
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: v1.21.5+rke2r1 / v1.21.7+rke2r2
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Imported RKE2

Describe the bug I have three more or less empty clusters deployed with RKE2, only one of them seems to be reporting a correct value for reserved cpu/mem. And pods seem to be missing too, but only in the rancher home screen...

As you can see rke2-downstream2 cluster is reporting no reservations at all and test-rk2 seems to report too much reservations for an empty cluster

This is the output of the resource-capacity plugin if krew: rke2-downstream2:

❯ kubectl resource-capacity
NODE                        CPU REQUESTS   CPU LIMITS    MEMORY REQUESTS   MEMORY LIMITS
*                           8020m (44%)    4120m (22%)   2387Mi (3%)       5323Mi (7%)
rke2-downstream2-agent-1    900m (22%)     1300m (32%)   284Mi (1%)        630Mi (3%)
rke2-downstream2-agent-2    1850m (46%)    1700m (42%)   1341Mi (7%)       4119Mi (24%)
rke2-downstream2-agent-3    820m (20%)     420m (10%)    252Mi (1%)        284Mi (1%)
rke2-downstream2-server-1   1450m (72%)    200m (10%)    126Mi (1%)        53Mi (0%)
rke2-downstream2-server-2   1450m (72%)    200m (10%)    126Mi (1%)        53Mi (0%)
rke2-downstream2-server-3   1550m (77%)    300m (15%)    261Mi (3%)        187Mi (2%)

and test-rke2:

❯ kubectl resource-capacity
NODE                 CPU REQUESTS   CPU LIMITS   MEMORY REQUESTS   MEMORY LIMITS
*                    6070m (33%)    220m (1%)    856Mi (1%)        290Mi (0%)
rke2-agent-node-1    600m (15%)     0Mi (0%)     95Mi (0%)         0Mi (0%)
rke2-agent-node-2    600m (15%)     0Mi (0%)     95Mi (0%)         0Mi (0%)
rke2-agent-node-3    600m (15%)     0Mi (0%)     95Mi (0%)         0Mi (0%)
rke2-server-node-1   1570m (78%)    220m (11%)   384Mi (4%)        290Mi (3%)
rke2-server-node-2   1350m (67%)    0Mi (0%)     95Mi (1%)         0Mi (0%)
rke2-server-node-3   1350m (67%)    0Mi (0%)     95Mi (1%)         0Mi (0%)

All server nodes are 2 CPU and agent nodes 4 CPU vms btw.

rke2-downstream2 has monitoring installed, test-rke2 has not

Dec 29 '21 09:12 erSitzt

btw the issue in the rke2-downstream2 cluster existed even before i tried the upgrade to v1.21.7+rke2r2 via rancher ui

Dec 29 '21 10:12 erSitzt

Same issue with an aks cluster on 1.21.7 and Rancher 2.6.3 Other cluster deployed with rke also on 1.21.7 are reporting correctly

Jan 04 '22 10:01 Yannis100

https://rancher-addreess.domain.com/v1/management.cattle.io.cluster is returning all zeroes for the clusters in question so its not an display issue..

Jan 04 '22 11:01 erSitzt

Deployed another RKE2 cluster and imported it... same result

Jan 06 '22 09:01 erSitzt

Recreated the RKE2 cluster which was reporting correct values (terraform) and now it is not reporting any CPU/MEM/Pod values

It could be that my two working clusters were imported when Rancher was still on 2.6.2 while all the other cluster were imported after the update to 2.6.3 ,but im not 100% sure

Jan 06 '22 11:01 erSitzt

Same issue with RKE1 for me. Working stats built with 2.6.2 Non-working stats built with 2.6.3

Jan 07 '22 07:01 semaforce-sean

Seems to be getting weirder... my recreated cluster now started to report values... that seem a little off :)

Jan 07 '22 13:01 erSitzt

And i just compared the Pod count of all clusters that are reporting values... none of them are correct

Are those numbers filtered or averaged ? or excluding some "system" pods ?

So there are my numbers in the home screen

And this is in the cluster itself

Jan 07 '22 13:01 erSitzt

Hi,

We have the same trouble :

Our version is Rancher 2.6.3, we don't have the problem with 2.6.2. The monitoring is installed on Downstream and not on Local.

Jan 13 '22 06:01 dtrouillet

Same issue.

Jan 13 '22 19:01 silentdark

We have the same problem. Reservations and Limits are only shown for the local cluster I suspect this change to be the culprit: https://github.com/rancher/rancher/commit/3453a429bf4107dde095dfcf0256daf93ec6ffb3

This utilizes annotations on the v1/Node to detect current limits and reservations instead of calculating it based on the pods. These annotations do actually exist however they don't seem to get correctly synced to the "management.cattle.io/Node" resource.

apiVersion: v1
kind: Node
metadata:
  name: master-server1
  annotations:
    ...
    management.cattle.io/pod-limits: '{"cpu":"300m","memory":"178Mi"}'
    management.cattle.io/pod-requests: '{"cpu":"1725m","memory":"2211Mi","pods":"19"}'
    ....

In my case the limits and requested from this resource is only available for the local cluster:

apiVersion: management.cattle.io/v3
kind: Node
metadata:
  name: machine-laqe1
  namespace: c-m-1r131swx
  ...
  limits:
    cpu: 120m
    memory: 148Mi
  requested:
    cpu: 745m
    memory: 243Mi
    pods: '21'

Jan 14 '22 11:01 WolfspiritM

I have a similar issue, except it's not showing 0%, nor is the effect anywhere but in the cluster that was upgraded to 1.22 (vs the others 1.21). Lens-IDE shows all the values correctly.

Rancher:

vs Lens:

Mar 04 '22 09:03 siegenthalerroger

Might be same as or related to https://github.com/rancher/rancher/issues/36229

Mar 10 '22 23:03 dnoland1

This also happens with the AKS provider. On the image, clusters that show counts had been imported a while ago, while all newly imported with kubernetes v1.21.7 show wrong counts.

shot_220330_114154

Mar 30 '22 14:03 fgielow

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

May 30 '22 02:05 github-actions[bot]

remove-stale

Jun 03 '22 16:06 imriss

I guess the commit 3453a429bf4107dde095dfcf0256daf93ec6ffb3 introduced the sync issue, remove annotate management.cattle.io/nodesyncer will trigger a force sync per the nodessyncer logic

CLUSTER_ID="the cluster id"
for machine in $(kubectl get nodes.management.cattle.io -n $CLUSTER_ID  -o name --no-headers);do
 kubectl annotate -n $CLUSTER_ID $machine  management.cattle.io/nodesyncer-
done

Jun 08 '22 04:06 fengxx

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Aug 08 '22 02:08 github-actions[bot]

We see the same issue. memory requests are not shown.

bilde

Aug 18 '22 13:08 ronnyaa

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Oct 18 '22 02:10 github-actions[bot]

In our case right now the memory might be right but the "max memory" is wrong. We have more then 23GB of ram.

Oct 21 '22 15:10 libreo-abrettschneider

In our case right now the memory might be right but the "max memory" is wrong. We have more then 23GB of ram.

Same here Screen Shot 2022-10-25 at 00 05 50

Oct 25 '22 03:10 nilber

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Dec 25 '22 01:12 github-actions[bot]

please reopen

Jan 09 '23 15:01 erSitzt

I got the same issue with k3s --version k3s version v1.27.7+k3s2 (575bce76) go version go1.20.10 Can we reopen the bug ? and how to fix ?

Mar 17 '24 20:03 yodatak

rancher rancher copied to clipboard

Rancher reporting cpu/mem reserved and pod count wrong

rancher
rancher copied to clipboard