pd icon indicating copy to clipboard operation
pd copied to clipboard

Add 'store' label to metric pd_cluster_status.

Open SerjKol80 opened this issue 2 months ago • 2 comments

Enhancement Task

Currently gauge metric pd_cluster_status is aggregated across all stores. This is pretty inconvenient, because if you have long running deployment across TiKV component this metric indicate problematic store for a long period of time and it will not be possible to distinguish different tikv stores contributed to this metric.

What is suggested. Instead of emitting counts across all stores here, we will emit the same gauge metrics with additional label 'store' with values either 0 or 1 in observe() method. This will allow to get more detailed view from the metric. And aggregation across all values of 'store' label will let you to get the same metrics as today.

SerjKol80 avatar Oct 22 '25 00:10 SerjKol80

Welcome @SerjKol80! It looks like this is your first issue to tikv/pd 🎉

ti-chi-bot[bot] avatar Oct 22 '25 00:10 ti-chi-bot[bot]

Enhancement Task

Currently gauge metric pd_cluster_status is aggregated across all stores. This is pretty inconvenient, because if you have long running deployment across TiKV component this metric indicate problematic store for a long period of time and it will not be possible to distinguish different tikv stores contributed to this metric.

What is suggested. Instead of emitting counts across all stores here, we will emit the same gauge metrics with additional label 'store' with values either 0 or 1 in observe() method. This will allow to get more detailed view from the metric. And aggregation across all values of 'store' label will let you to get the same metrics as today.

you can find the store label details in dashboard

bufferflies avatar Oct 23 '25 00:10 bufferflies