harvest
harvest copied to clipboard
Information missing in Aggregate and Disk dashboards when compared with 1.6
Is your feature request related to a problem? Please describe.
We are missing the disk utilization graph and highlight panel in the Aggregate dashboard like the harvest 1.6 has in the dashboard: NetApp Detail: Disk and Cache Layers. Additionally we will be interesting to recover the graph for User Read/Write IOP Size, this information could feed the dashboard of Disk, like happened in the past with this information which was in the dashboard: NetApp Detail: Disk and Cache Layers. Also we missed other relevant information for us in the Cluster Dashboard and is the highlight
Describe the solution you'd like
Add new information for the dashboard for Aggregate and Disk.
For the dashboard Aggregate:
- Disk utilization
,for the Disk dashboard:
- User read/write IOP Size
and the last in the Cluster dashboard:
- Highlights
Additional context
I can do add the information for the disk utilization by my self but the problem is everytime a new release is coming, I will lose the changes that I did in my dashboard since I upload again the new panels with the information, changes or something do it by you. In the case of the User Read/Write IOP Size I don't know if I have that information in my current instance of harvest. Refer to the Cluster highlights is the same issue that the disk utilization.
-
Disk utilization and highlight of disk utilization too.
-
User Read/Write IOP Size
-
Cluster highlight
- For the
Disk utilization
panels, we already have them in Disk dashboard. Above right table has instant data whereas below left panel has time range data. Earlier, topk wasn't supported in them, Now made the appropriate changes to support topk.
- For the
User read/write IOP Size
panels, Added them in Disk dashboard just below theDisk utilization
panel.
- For the
Highlights
panels, Added them in Cluster dashboard.
Hello @Hardikl
I will try to respond for every point based on my perspective.
-
For the Disk utilization panels. About the metric that you mentioned is the MAX Disk Utilization not the AVG or Current Disk Utilization which was the metric in the older in 1.6. Additionally, it would be helpful to have a highlight like it shows the screenshot tha I uploaded about this dashboard. For us additionally of the Disk dashboard, this metric should be added in the aggregate dashboard. Currently this panel has "Storage Efficiency" and "Storage Used" so a new section for the Performance it would be a plus.
-
For the User read/write IOP Size panels. Let me know when your merge request will be published and I can download the dashboard with the new configuration for trying.
-
Cluster Highlights panels. Nice, I am hoping to try the new dashboard. One doubt, I don't see the other information about Nodes & Subsystems which where in the init of this panel. Do you still have on it?
-
SVM Performance Drilldown in ONTAP: Cluster Maybe the unit for the latency is wrong?
Hi @faguayot
-
Understood your perspective. I would leave this panel in Disk dashboard as-is and I have added new panels in Aggregate dashboard which would be aggregation at aggr level and not MAX.
-
This is PR: https://github.com/NetApp/harvest/pull/1320/. It has dashboard changes along with minor template change. So, you would need to do both in your system for local testing.
-
It's same Cluster dashboard. Just added these panels in
Highlights
, rest all panels are as-is. -
I wouldn't say they are wrong, just they would change unit based on the latency value. If the value > 1000 then grafana would show in
s(sec)
and notms(millisec)
hello @Hardikl
Checking again the units for the graph, I think those are in a incorrect units. We don't have any perfomance issues in this cluster or others and 1minute of latency is too much. We haven't seen this metrics never.
thanks for the follow-up @faguayot. I wanted to make sure, you running nightly with Hardik's changes?
As reported by jf38800
on Discord, these units are wrong.
bin/zapi --poller u2 show counters --object volume:vserver | dasel -r xml -w json | less
Shows the units are in microsec
while the panel @faguayot pasted above is using milliseconds
. Fix incoming
{
"base-counter": "total_ops",
"desc": "Average latency in microseconds for the WAFL filesystem to process all the operations on the volume; not including request processing or network communication time",
"is-deprecated": "false",
"name": "avg_latency",
"privilege-level": "admin",
"properties": "average",
"unit": "microsec" <====================
},
Hello @cgrinds
No, I am not running the nightly build but now that I am sure that the unit in this dashboard for the 22.08 version and in the Top Latency was wrong, so I corrected by myself.
Thanks.
Thanks again for bringing to our attention
verified in 22.11. Some dashboard UX changes in #1496