risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

tracking: refactor metrics with `LabelGuarded`

Open fuyufjh opened this issue 1 year ago • 6 comments

Background

LabelGuardedMetricVec was introduced in #13080. It enhances the MetricVec to ensure the set of labels to be correctly removed from the Prometheus client once being dropped. This is useful for metrics that are associated with an object that can be dropped, such as streaming jobs, fragments, actors, batch tasks, etc.

When a set labels is dropped, it will record it in the uncollected_removed_labels set. Once the metrics has been collected, it will finally remove the metrics of the labels.

To-dos

Technically, all usages of plain MetricVec of a drop-able object (streaming jobs, fragments, actors, batch tasks, etc.) need to be replaced with LabelGuardedMetricVec

  • [ ] #16728
  • [ ] #16729
  • [x] Refactor SinkMetrics with LabelGuarded ones
  • [x] #15174
  • [ ] #15475

fuyufjh avatar Jan 29 '24 07:01 fuyufjh

Could be related: #13086

BugenZhao avatar Feb 21 '24 09:02 BugenZhao

related https://github.com/risingwavelabs/risingwave/issues/14821

fuyufjh avatar Mar 06 '24 10:03 fuyufjh

So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?

xxchan avatar Mar 07 '24 10:03 xxchan

So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?

True. Part of them have been fixed (for example, check the StreamingMetrics). Anyone taking this issue please help to check whether the remaining usage are correct.

fuyufjh avatar Mar 07 '24 10:03 fuyufjh

which is always zero valued

I have observed some non-zero constant values on the Grafana, although I am not sure if it is the same root cause

lmatz avatar Mar 07 '24 12:03 lmatz

Yes, should be constant. Not necessarily zero.

Can examine by checking localhost:1222. e.g., stream_mview_input_row_count for a dropped actor

xxchan avatar Mar 08 '24 06:03 xxchan