harvest Creating dashboard for Headroom with Prometheus

Is your feature request related to a problem? Please describe. We would like to have a headroom dashboard. Until I know the metrics are gathered with the current version 22.05.

Describe the solution you'd like A dashboard for headroom working with Prometheus DB.

May 18 '22 14:05 faguayot

@faguayot I have created a headroom dashboard through this PR. Let us know your feedback as this will be part of our next official release.

Thanks

Aug 03 '22 12:08 rahulguptajss

Verified on release/22.08.0 a05a0279

Aug 15 '22 18:08 cgrinds

Hello @rahulguptajss, Apologize because I forgot to check this. I was checking now and I found some errors (I guess).

Firstly, the units for Current throughtput and Current utilization (in aggregate section and cpu too) are wrong. I saw an issue for correcting the current utilization but not the others ([https://github.com/NetApp/harvest/issues/1288]).
Secondly, in the graph where use percentage I think you need to define the mininum and the maxixum because the visual the scale isn't right. An exmaple of the view:
What is "Optimal Point and EWMA"? Could you share information where both points are explained for understanding.
Without having clear what it is Optimal Point we have a node (AFF300) with a ssd aggregate which is showing the optimal point throughput bigger than a new node (AFF400) with nvme. That makes sense for you? In both nodes the capacity is almost the same, even the new node .has 3-4TB more.

Thanks.

Sep 29 '22 11:09 faguayot

@faguayot Today only there have been several improvements to this dashboard. You can try the new headroom dashboard from here.

1: Issue one about units should be fixed in latest dashboard. 2: About scale, we'll check. 3: You can check more information about these counters by running below cli bin/zapi -p POLLERNAME show counters --object resource_headroom_aggr bin/zapi -p POLLERNAME show counters --object resource_headroom_cpu 4: Let us know if counter information helps in answering this.

Sep 29 '22 13:09 rahulguptajss

Yes, the issue about units seems to be solved but I saw a negative values in the "Available Ops : Aggregate" and in the "Available Ops: CPU"

Ok
Using the commands provided to try understand the metrics.
I need time to check and understand if I have doubts.

Sep 29 '22 14:09 faguayot

Thanks. 1: Yes we see negative values as well in our system. We'll check and get back on this. 2: Scale is resolved https://github.com/NetApp/harvest/blob/main/grafana/dashboards/cmode/harvest_dashboard_headroom.json 3: Ok 4: We'll check as well.

Sep 29 '22 18:09 rahulguptajss

Spoke to Rusty, who helped create these. I'll update the information on these panels to include something like this:

This graph displays the difference between CPU Utilization and Peak Performance (Optimal Point) as Available Ops (aka Headroom). If the current Available utilization is very low or negative for an extended time, a performance remediation plan might be appropriate. A performance remediation plan might include setting QoS workload limits, moving volumes or LUNs to another storage controller, or expanding the storage cluster.

Sep 29 '22 19:09 cgrinds

The doc update is missing. PR #1469 includes the doc updates. I'll move this to status/done once that PR is merged

Nov 14 '22 20:11 cgrinds

Verified on 22.11

Nov 15 '22 13:11 cgrinds

harvest harvest copied to clipboard

Creating dashboard for Headroom with Prometheus

harvest
harvest copied to clipboard