dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Add Rancher Component Health checks in Dashboard

Open nwmac opened this issue 1 year ago • 1 comments

Internal Reference: SURE-5757

The cluster dashboard view for a cluster shows the states of some core components of k8s:

image

We show the state of Etcsd, Scheduler and Controller Manager.

We should add two new boxes next to these two:

  • Cattle Agent (note, this is not present in the local cluster, only downstream clusters)
  • Fleet Agent

These two are both deployments that run in the cluster - not all users might have access to see them, so if the user can't see them, we should not show anything for them and we need to handle this.

We also need to ensure any requests we make are performant and don't introduce possible performance issues with large clusters.

For each of the two agents:

  • Show the green tick, unless:
  • Show a red cross if the deployment has an error condition
  • Show an orange warning if the deployment has issues with its replicas => spec.replicas on the deployment indicates how many replicas there should be. There are various fields on the status field of the deployment that indicate the number of active replicas, unavailableReplicas etc that can be used to determine if there is a problem with replicas - would suggest checking that readyReplicas matches the expected replica count and that unavailableReplicas is 0.

nwmac avatar Feb 27 '24 13:02 nwmac

@nwmac I pushed the unit tests in the PR, but I think a manual test, scaling up/down fleet and cattle deployments, would be useful. For this reason, I don't know what would be the test label for this Issue.

torchiaf avatar Mar 14 '24 16:03 torchiaf

If it needs some manual test, then the manual test label is best. QA can look at the unit tests to help to understand what needs manual coverage.

nwmac avatar Mar 22 '24 09:03 nwmac