harvest icon indicating copy to clipboard operation
harvest copied to clipboard

Harvest should create a `KeyPerf` collector for ONTAP REST performance counters

Open cgrinds opened this issue 1 year ago • 1 comments

This issue is about creating a collector for ONTAP objects that includes a statistics or metric field in the ONTAP response. This collector is distinct from the ZapiPerf and RestPerf collectors since the shape of the ONTAP response for statistics and metric is different from the ONTAP responses for ZapiPerf and RestPerf.

In general, the statistics and metric fields include performance metrics for IOPS, latency, and throughput. The statistics metrics are raw performance counters, while the metric counters are samples over one of the predefined ranges (15 seconds, four minutes, five minutes, 30 minutes, two hours, one day).

The statistics field is more general and likely covers all of Harvest's use cases. If that's true, we may only support the statistics field and ignore the metric field. The metric field is difficult to make work with Prometheus since Prometheus controls the timestamp used to stamp the metrics, not Harvest.

Background

The statistics and metric counters are aggregated across all nodes in the cluster. These counters have existed since ONTAP 9.6, and as of ONTAP 9.15.1, the statistics field is available for the following 25 objects. The /application/applications endpoint is different from the other endpoints since /application/applications's response includes statistics but no metric, includes additional fields beyond IOPS, latency, and throughput, and also uses a different naming convention.

cat 10.193.48.154-swagger.yaml | dasel -r yaml -w json | gron | rg -F '.properties.statistics = {};' | rg -v '.xc_'
json.definitions.aggregate.properties.statistics = {};
json.definitions.application.properties.statistics = {};
json.definitions.cifs_service.properties.statistics = {};
json.definitions.cluster.properties.nodes.items.properties.statistics = {};
json.definitions.cluster.properties.statistics = {};
json.definitions.consistency_group.properties.statistics = {};
json.definitions.consistency_group_response.properties.records.items.properties.statistics = {};
json.definitions.fc_interface.properties.statistics = {};
json.definitions.fc_port.properties.statistics = {};
json.definitions.fcp_service.properties.statistics = {};
json.definitions.ip_interface.properties.statistics = {};
json.definitions.iscsi_service.properties.statistics = {};
json.definitions.lun.properties.statistics = {};
json.definitions.monitored_file.properties.statistics = {};
json.definitions.nfs_service.properties.statistics = {};
json.definitions.node.properties.statistics = {};
json.definitions.node_response.properties.records.items.properties.statistics = {};
json.definitions.nvme_namespace.properties.statistics = {};
json.definitions.nvme_service.properties.statistics = {};
json.definitions.port.properties.statistics = {};
json.definitions.qtree.properties.statistics = {};
json.definitions.s3_service.properties.statistics = {};
json.definitions.svm_ip_interface.properties.statistics = {};
json.definitions.switch_port.properties.statistics = {};
json.definitions.volume.properties.statistics = {};

Status field

The collector needs to handle all status enums:

  • ok
  • error
  • partial_no_data
  • partial_no_response
  • partial_other_error
  • negative_delta
  • not_found
  • backfilled_data
  • inconsistent_delta_time
  • inconsistent_old_data
  • partial_no_uuid

Examples

curl -k 'https://10.193.48.154/api/cluster?fields=statistics'
{
  "statistics": {
    "timestamp": "2024-06-21T14:38:22Z",
    "status": "ok",
    "latency_raw": {
      "other": 1516853741,
      "total": 3104533452181,
      "read": 2895738710563,
      "write": 207277887877
    },
    "iops_raw": {
      "read": 7660818902,
      "write": 263263046,
      "other": 4993299,
      "total": 7929075247
    },
    "throughput_raw": {
      "read": 453439274550417,
      "write": 1978439829907,
      "other": 2812081937,
      "total": 455420526462261
    }
  }
}
curl -k 'https://10.193.48.154/api/application/applications?fields=statistics'
 {
      "uuid": "dd2086bb-6289-11ee-868b-00a098d390f2",
      "name": "newvol",
      "statistics": {
        "shared_storage_pool": false,
        "space": {
          "provisioned": 22077440,
          "used": 2244608,
          "used_percent": 10,
          "used_excluding_reserves": 1142784,
          "logical_used": 2244608,
          "reserved_unused": 0,
          "available": 19832832,
          "savings": 0
        },
        "iops": {
          "total": 0,
          "per_tb": 0
        },
        "snapshot": {
          "reserve": 1101824,
          "used": 1945600
        },
        "latency": {
          "raw": 0,
          "average": 0
        },
        "components": [
          {
            "name": "newvol",
            "uuid": "dd30b964-6289-11ee-868b-00a098d390f2",
            "shared_storage_pool": false,
            "storage_service": {
              "name": "extreme",
              "uuid": "0743fa34-43b7-4a87-ba8f-96816a0590a0"
            },
            "space": {
              "provisioned": 22077440,
              "used": 2244608,
              "used_percent": 10,
              "used_excluding_reserves": 1142784,
              "logical_used": 2244608,
              "reserved_unused": 0,
              "available": 19832832,
              "savings": 0
            },
            "iops": {
              "total": 0,
              "per_tb": 0
            },
            "snapshot": {
              "reserve": 1101824,
              "used": 1945600
            },
            "latency": {
              "raw": 0,
              "average": 0
            }
          }
        ]
      }
 }

Alternative names

  • AggregatedPerf
  • DataFlow
  • EfficiencyMetrics
  • IOFlow
  • IOPerf
  • IOProfiling
  • IOVelocityMetrics
  • KeyPerf
  • KeyPerfMetrics
  • KPM (key performance metrics)
  • MetricOps
  • ObjectMetrics
  • ObjectPerf
  • OperationalMetrics
  • OperationalPerformance
  • OpMetrics
  • PerfIO
  • PerfMetrics
  • PerformanceIndicators
  • PerformanceMetrics
  • PerformanceOverview
  • PerfStatistics
  • PerfStream
  • PerfTriad
  • RawPerf
  • SimpleStats
  • SummaryStats
  • SystemEfficiency
  • SystemMetrics
  • ThruLatIOPS

cgrinds avatar Jun 21 '24 17:06 cgrinds

  • [x] Infrastructure Development
  • [x] Unit Tests
  • [x] Asup
  • [x] Template development
  • [x] Dedup logic with restPerf (if required)
  • [x] Documentation
  • [x] Metric Documentation
  • [x] Support filtering
  • [x] object plugins if any
    • [x] Add Top Client/File support in KeyPerf
  • [x] Enable any CI tests
  • [x] Dashboard changes if any
  • [x] Remove unused templates
  • [ ] Add tags to dashboards

rahulguptajss avatar Jul 22 '24 07:07 rahulguptajss

moving remaining work to next release

rahulguptajss avatar Nov 04 '24 05:11 rahulguptajss

Below dashboards can be excluded from KeyPerf tagging

cmode/external_service_op.json cmode/headroom.json cmode/lun.json cmode/mcc_cluster.json cmode/namespace.json cmode/nfs4storePool.json cmode/workload.json cmode/vscan.json cmode/smb.json cmode/s3ObjectStorage.json cmode/nfsTroubleshooting.json (only few panels will work) cmode/nfs4storePool.json cmode/network.json

rahulguptajss avatar Nov 19 '24 09:11 rahulguptajss

Dashboards will be handled in a separate PR. Closing

cgrinds avatar Dec 02 '24 13:12 cgrinds

verified in main

rahulguptajss avatar Feb 10 '25 14:02 rahulguptajss