harvest icon indicating copy to clipboard operation
harvest copied to clipboard

Information missing in Aggregate and Disk dashboards when compared with 1.6

Open faguayot opened this issue 2 years ago • 8 comments

Is your feature request related to a problem? Please describe.

We are missing the disk utilization graph and highlight panel in the Aggregate dashboard like the harvest 1.6 has in the dashboard: NetApp Detail: Disk and Cache Layers. Additionally we will be interesting to recover the graph for User Read/Write IOP Size, this information could feed the dashboard of Disk, like happened in the past with this information which was in the dashboard: NetApp Detail: Disk and Cache Layers. Also we missed other relevant information for us in the Cluster Dashboard and is the highlight

Describe the solution you'd like

Add new information for the dashboard for Aggregate and Disk. For the dashboard Aggregate: - Disk utilization ,for the Disk dashboard: - User read/write IOP Size and the last in the Cluster dashboard: - Highlights

Additional context

I can do add the information for the disk utilization by my self but the problem is everytime a new release is coming, I will lose the changes that I did in my dashboard since I upload again the new panels with the information, changes or something do it by you. In the case of the User Read/Write IOP Size I don't know if I have that information in my current instance of harvest. Refer to the Cluster highlights is the same issue that the disk utilization.

  • Disk utilization and highlight of disk utilization too. image

  • User Read/Write IOP Size image

  • Cluster highlight image

faguayot avatar Sep 20 '22 12:09 faguayot

  • For the Disk utilizationpanels, we already have them in Disk dashboard. Above right table has instant data whereas below left panel has time range data. Earlier, topk wasn't supported in them, Now made the appropriate changes to support topk.

image

  • For the User read/write IOP Size panels, Added them in Disk dashboard just below the Disk utilization panel.

image

  • For the Highlights panels, Added them in Cluster dashboard.

image

Hardikl avatar Sep 29 '22 13:09 Hardikl

Hello @Hardikl

I will try to respond for every point based on my perspective.

  1. For the Disk utilization panels. About the metric that you mentioned is the MAX Disk Utilization not the AVG or Current Disk Utilization which was the metric in the older in 1.6. Additionally, it would be helpful to have a highlight like it shows the screenshot tha I uploaded about this dashboard. For us additionally of the Disk dashboard, this metric should be added in the aggregate dashboard. Currently this panel has "Storage Efficiency" and "Storage Used" so a new section for the Performance it would be a plus.

  2. For the User read/write IOP Size panels. Let me know when your merge request will be published and I can download the dashboard with the new configuration for trying.

  3. Cluster Highlights panels. Nice, I am hoping to try the new dashboard. One doubt, I don't see the other information about Nodes & Subsystems which where in the init of this panel. Do you still have on it?

  4. SVM Performance Drilldown in ONTAP: Cluster Maybe the unit for the latency is wrong? image

faguayot avatar Sep 29 '22 14:09 faguayot

Hi @faguayot

  1. Understood your perspective. I would leave this panel in Disk dashboard as-is and I have added new panels in Aggregate dashboard which would be aggregation at aggr level and not MAX. image

  2. This is PR: https://github.com/NetApp/harvest/pull/1320/. It has dashboard changes along with minor template change. So, you would need to do both in your system for local testing.

  3. It's same Cluster dashboard. Just added these panels in Highlights, rest all panels are as-is. image

  4. I wouldn't say they are wrong, just they would change unit based on the latency value. If the value > 1000 then grafana would show in s(sec) and not ms(millisec) image image

Hardikl avatar Sep 30 '22 06:09 Hardikl

hello @Hardikl

Checking again the units for the graph, I think those are in a incorrect units. We don't have any perfomance issues in this cluster or others and 1minute of latency is too much. We haven't seen this metrics never.

image

faguayot avatar Oct 24 '22 08:10 faguayot

thanks for the follow-up @faguayot. I wanted to make sure, you running nightly with Hardik's changes?

cgrinds avatar Oct 24 '22 12:10 cgrinds

As reported by jf38800 on Discord, these units are wrong.

bin/zapi --poller u2 show counters --object volume:vserver | dasel -r xml -w json | less

Shows the units are in microsec while the panel @faguayot pasted above is using milliseconds. Fix incoming

 {
        "base-counter": "total_ops",
        "desc": "Average latency in microseconds for the WAFL filesystem to process all the operations on the volume; not including request processing or network communication time",
        "is-deprecated": "false",
        "name": "avg_latency",
        "privilege-level": "admin",
        "properties": "average",
        "unit": "microsec"                 <====================
      },

cgrinds avatar Oct 24 '22 13:10 cgrinds

Hello @cgrinds

No, I am not running the nightly build but now that I am sure that the unit in this dashboard for the 22.08 version and in the Top Latency was wrong, so I corrected by myself.

Thanks.

faguayot avatar Oct 24 '22 16:10 faguayot

Thanks again for bringing to our attention

cgrinds avatar Oct 24 '22 16:10 cgrinds

verified in 22.11. Some dashboard UX changes in #1496

rahulguptajss avatar Nov 17 '22 07:11 rahulguptajss