harvest icon indicating copy to clipboard operation
harvest copied to clipboard

NetApp Detail: Cluster dashboard - general cluster health status

Open BrendonA667 opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. When a node of a cluster is rebooted or not available the cluster health status is still ok.

Describe the solution you'd like If a node is not reachable during a reboot or maintenance the cluster health status should be degraded. Only if all the nodes are online it should be ok.

Additional context image

BrendonA667 avatar Mar 15 '22 14:03 BrendonA667

thanks for reporting @Falcon667

ONTAP considers the cluster healthy since the healthy node has taken over for the unhealthy one.

This is a bit similar to the discussion in #885. Harvest asks ONTAP what the health of the cluster is by sending the diagnosis-status-get ZAPI. That ZAPI returns the overall system health. @rahulguptajss recreated in the lab by halting a node and the ZAPI returns, A-OK as shown below.

bin/zapi -p u2 show data --api diagnosis-status-get | xml fo
connected to umeng-aff300-05-06 (NetApp Release 9.7P7: Thu Aug 27 20:57:05 UTC 2020)
<?xml version="1.0"?>
<root>
  <attributes>
    <diagnosis-status>
      <status>ok</status>
    </diagnosis-status>
  </attributes>
</root>

Node healthy is determined by system-node-get-iter. With a down node, ONTAP correctly returns not healthy as you showed in your screenshot.

So that's why it happens, this use case is probably better handled with the, yet-to-be-implemented EMS collector mentioned in #892

cgrinds avatar Mar 16 '22 14:03 cgrinds

Closing. Will be handled in #1525

cgrinds avatar Feb 23 '23 14:02 cgrinds