mayastor icon indicating copy to clipboard operation
mayastor copied to clipboard

More prometheus Volume & Replica metrics

Open absolutejam opened this issue 7 months ago • 1 comments

Is your feature request related to a problem? Please describe. Prometheus exporter should expose more volume & replica information, such as the info available in kubectl mayastor get volume-replica-topologies.

Describe the solution you'd like Ideally, I'd like to see metrics along the lines of:

  • Volume replicas (with a status label, eg. Failed, Degraded, Online)
  • Replica rebuild progress (this could potentially be a gauge of the current rebuild state of a replica)
  • A volume label on the replica metrics
  • And this is purely an opinion, but it feels like the metrics should be prefixed with mayastor_ for identification

Generally, the more information (without unecessary cardinality) the better. This makes correlating service issues with volume issues much easier and unlocks better visualisations & alerts.

For example, this query in Grafana:

label_replace(
    irate(replica_num_write_ops{name=~"$replica"}[$__rate_interval]),
    "volumename", "$1", "pv_name", "(.*)"
)
* on (volumename) group_left (persistentvolumeclaim)
kube_persistentvolumeclaim_info

Image

absolutejam avatar May 23 '25 10:05 absolutejam

Adding to this, it would be really amazing if we could get the underlying device name as a label. This would mean we could correlate some of the node_exporter status such as node_disk_io_time_weighted_seconds_total direct to volumes & PVCs.

absolutejam avatar May 29 '25 07:05 absolutejam