Harvest should create a `KeyPerf` collector for ONTAP REST performance counters
This issue is about creating a collector for ONTAP objects that includes a statistics or metric field in the ONTAP response. This collector is distinct from the ZapiPerf and RestPerf collectors since the shape of the ONTAP response for statistics and metric is different from the ONTAP responses for ZapiPerf and RestPerf.
In general, the statistics and metric fields include performance metrics for IOPS, latency, and throughput. The statistics metrics are raw performance counters, while the metric counters are samples over one of the predefined ranges (15 seconds, four minutes, five minutes, 30 minutes, two hours, one day).
The statistics field is more general and likely covers all of Harvest's use cases. If that's true, we may only support the statistics field and ignore the metric field. The metric field is difficult to make work with Prometheus since Prometheus controls the timestamp used to stamp the metrics, not Harvest.
Background
The statistics and metric counters are aggregated across all nodes in the cluster. These counters have existed since ONTAP 9.6, and as of ONTAP 9.15.1, the statistics field is available for the following 25 objects. The /application/applications endpoint is different from the other endpoints since /application/applications's response includes statistics but no metric, includes additional fields beyond IOPS, latency, and throughput, and also uses a different naming convention.
cat 10.193.48.154-swagger.yaml | dasel -r yaml -w json | gron | rg -F '.properties.statistics = {};' | rg -v '.xc_'
json.definitions.aggregate.properties.statistics = {};
json.definitions.application.properties.statistics = {};
json.definitions.cifs_service.properties.statistics = {};
json.definitions.cluster.properties.nodes.items.properties.statistics = {};
json.definitions.cluster.properties.statistics = {};
json.definitions.consistency_group.properties.statistics = {};
json.definitions.consistency_group_response.properties.records.items.properties.statistics = {};
json.definitions.fc_interface.properties.statistics = {};
json.definitions.fc_port.properties.statistics = {};
json.definitions.fcp_service.properties.statistics = {};
json.definitions.ip_interface.properties.statistics = {};
json.definitions.iscsi_service.properties.statistics = {};
json.definitions.lun.properties.statistics = {};
json.definitions.monitored_file.properties.statistics = {};
json.definitions.nfs_service.properties.statistics = {};
json.definitions.node.properties.statistics = {};
json.definitions.node_response.properties.records.items.properties.statistics = {};
json.definitions.nvme_namespace.properties.statistics = {};
json.definitions.nvme_service.properties.statistics = {};
json.definitions.port.properties.statistics = {};
json.definitions.qtree.properties.statistics = {};
json.definitions.s3_service.properties.statistics = {};
json.definitions.svm_ip_interface.properties.statistics = {};
json.definitions.switch_port.properties.statistics = {};
json.definitions.volume.properties.statistics = {};
Status field
The collector needs to handle all status enums:
- ok
- error
- partial_no_data
- partial_no_response
- partial_other_error
- negative_delta
- not_found
- backfilled_data
- inconsistent_delta_time
- inconsistent_old_data
- partial_no_uuid
Examples
curl -k 'https://10.193.48.154/api/cluster?fields=statistics'
{
"statistics": {
"timestamp": "2024-06-21T14:38:22Z",
"status": "ok",
"latency_raw": {
"other": 1516853741,
"total": 3104533452181,
"read": 2895738710563,
"write": 207277887877
},
"iops_raw": {
"read": 7660818902,
"write": 263263046,
"other": 4993299,
"total": 7929075247
},
"throughput_raw": {
"read": 453439274550417,
"write": 1978439829907,
"other": 2812081937,
"total": 455420526462261
}
}
}
curl -k 'https://10.193.48.154/api/application/applications?fields=statistics'
{
"uuid": "dd2086bb-6289-11ee-868b-00a098d390f2",
"name": "newvol",
"statistics": {
"shared_storage_pool": false,
"space": {
"provisioned": 22077440,
"used": 2244608,
"used_percent": 10,
"used_excluding_reserves": 1142784,
"logical_used": 2244608,
"reserved_unused": 0,
"available": 19832832,
"savings": 0
},
"iops": {
"total": 0,
"per_tb": 0
},
"snapshot": {
"reserve": 1101824,
"used": 1945600
},
"latency": {
"raw": 0,
"average": 0
},
"components": [
{
"name": "newvol",
"uuid": "dd30b964-6289-11ee-868b-00a098d390f2",
"shared_storage_pool": false,
"storage_service": {
"name": "extreme",
"uuid": "0743fa34-43b7-4a87-ba8f-96816a0590a0"
},
"space": {
"provisioned": 22077440,
"used": 2244608,
"used_percent": 10,
"used_excluding_reserves": 1142784,
"logical_used": 2244608,
"reserved_unused": 0,
"available": 19832832,
"savings": 0
},
"iops": {
"total": 0,
"per_tb": 0
},
"snapshot": {
"reserve": 1101824,
"used": 1945600
},
"latency": {
"raw": 0,
"average": 0
}
}
]
}
}
Alternative names
- AggregatedPerf
- DataFlow
- EfficiencyMetrics
- IOFlow
- IOPerf
- IOProfiling
- IOVelocityMetrics
- KeyPerf
- KeyPerfMetrics
- KPM (key performance metrics)
- MetricOps
- ObjectMetrics
- ObjectPerf
- OperationalMetrics
- OperationalPerformance
- OpMetrics
- PerfIO
- PerfMetrics
- PerformanceIndicators
- PerformanceMetrics
- PerformanceOverview
- PerfStatistics
- PerfStream
- PerfTriad
- RawPerf
- SimpleStats
- SummaryStats
- SystemEfficiency
- SystemMetrics
- ThruLatIOPS
- [x] Infrastructure Development
- [x] Unit Tests
- [x] Asup
- [x] Template development
- [x] Dedup logic with restPerf (if required)
- [x] Documentation
- [x] Metric Documentation
- [x] Support filtering
- [x] object plugins if any
- [x] Add Top Client/File support in KeyPerf
- [x] Enable any CI tests
- [x] Dashboard changes if any
- [x] Remove unused templates
- [ ] Add tags to dashboards
moving remaining work to next release
Below dashboards can be excluded from KeyPerf tagging
cmode/external_service_op.json cmode/headroom.json cmode/lun.json cmode/mcc_cluster.json cmode/namespace.json cmode/nfs4storePool.json cmode/workload.json cmode/vscan.json cmode/smb.json cmode/s3ObjectStorage.json cmode/nfsTroubleshooting.json (only few panels will work) cmode/nfs4storePool.json cmode/network.json
Dashboards will be handled in a separate PR. Closing
verified in main