risingwave
risingwave copied to clipboard
tracking: refactor metrics with `LabelGuarded`
Background
LabelGuardedMetricVec was introduced in #13080. It enhances the MetricVec to ensure the set of labels to be correctly removed from the Prometheus client once being dropped. This is useful for metrics that are associated with an object that can be dropped, such as streaming jobs, fragments, actors, batch tasks, etc.
When a set labels is dropped, it will record it in the uncollected_removed_labels set. Once the metrics has been collected, it will finally remove the metrics of the labels.
To-dos
Technically, all usages of plain MetricVec of a drop-able object (streaming jobs, fragments, actors, batch tasks, etc.) need to be replaced with LabelGuardedMetricVec
- [ ] #16728
- [ ] #16729
- [x] Refactor
SinkMetricswithLabelGuardedones - [x] #15174
- [ ] #15475
Could be related: #13086
related https://github.com/risingwavelabs/risingwave/issues/14821
So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?
So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?
True. Part of them have been fixed (for example, check the StreamingMetrics). Anyone taking this issue please help to check whether the remaining usage are correct.
which is always zero valued
I have observed some non-zero constant values on the Grafana, although I am not sure if it is the same root cause
Yes, should be constant. Not necessarily zero.
Can examine by checking localhost:1222. e.g., stream_mview_input_row_count for a dropped actor