harvest
harvest copied to clipboard
Creating dashboard for Headroom with Prometheus
Is your feature request related to a problem? Please describe. We would like to have a headroom dashboard. Until I know the metrics are gathered with the current version 22.05.
Describe the solution you'd like A dashboard for headroom working with Prometheus DB.
@faguayot I have created a headroom dashboard through this PR. Let us know your feedback as this will be part of our next official release.
Thanks
Verified on release/22.08.0
a05a0279
Hello @rahulguptajss, Apologize because I forgot to check this. I was checking now and I found some errors (I guess).
-
Firstly, the units for
Current throughtput
andCurrent utilization
(in aggregate section and cpu too) are wrong. I saw an issue for correcting the current utilization but not the others ([https://github.com/NetApp/harvest/issues/1288]). -
Secondly, in the graph where use percentage I think you need to define the mininum and the maxixum because the visual the scale isn't right. An exmaple of the view:
-
What is "Optimal Point and EWMA"? Could you share information where both points are explained for understanding.
-
Without having clear what it is Optimal Point we have a node (AFF300) with a ssd aggregate which is showing the optimal point throughput bigger than a new node (AFF400) with nvme. That makes sense for you? In both nodes the capacity is almost the same, even the new node .has 3-4TB more.
Thanks.
@faguayot Today only there have been several improvements to this dashboard. You can try the new headroom dashboard from here.
1: Issue one about units should be fixed in latest dashboard.
2: About scale, we'll check.
3: You can check more information about these counters by running below cli
bin/zapi -p POLLERNAME show counters --object resource_headroom_aggr
bin/zapi -p POLLERNAME show counters --object resource_headroom_cpu
4: Let us know if counter information helps in answering this.
- Yes, the issue about units seems to be solved but I saw a negative values in the "Available Ops : Aggregate" and in the "Available Ops: CPU"
- Ok
- Using the commands provided to try understand the metrics.
- I need time to check and understand if I have doubts.
Thanks. 1: Yes we see negative values as well in our system. We'll check and get back on this. 2: Scale is resolved https://github.com/NetApp/harvest/blob/main/grafana/dashboards/cmode/harvest_dashboard_headroom.json 3: Ok 4: We'll check as well.
Spoke to Rusty, who helped create these. I'll update the information on these panels to include something like this:
This graph displays the difference between CPU Utilization and Peak Performance (Optimal Point) as Available Ops (aka Headroom). If the current Available utilization is very low or negative for an extended time, a performance remediation plan might be appropriate. A performance remediation plan might include setting QoS workload limits, moving volumes or LUNs to another storage controller, or expanding the storage cluster.
The doc update is missing. PR #1469 includes the doc updates.
I'll move this to status/done
once that PR is merged
Verified on 22.11