risingwave
risingwave copied to clipboard
tracking: refactor metrics with `LabelGuarded`
Background
LabelGuardedMetricVec
was introduced in #13080. It enhances the MetricVec
to ensure the set of labels to be correctly removed from the Prometheus client once being dropped. This is useful for metrics that are associated with an object that can be dropped, such as streaming jobs, fragments, actors, batch tasks, etc.
When a set labels is dropped, it will record it in the uncollected_removed_labels
set. Once the metrics has been collected, it will finally remove the metrics of the labels.
To-dos
Technically, all usages of plain MetricVec
of a drop-able object (streaming jobs, fragments, actors, batch tasks, etc.) need to be replaced with LabelGuardedMetricVec
- [ ] #16728
- [ ] #16729
- [x] Refactor
SinkMetrics
withLabelGuarded
ones - [x] #15174
- [ ] #15475
Could be related: #13086
related https://github.com/risingwavelabs/risingwave/issues/14821
So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?
So currently when a streaming job is dropped, it's metrics will be leaked (i.e., prometheus collected some useless data, which is always zero valued), right?
True. Part of them have been fixed (for example, check the StreamingMetrics
). Anyone taking this issue please help to check whether the remaining usage are correct.
which is always zero valued
I have observed some non-zero constant values on the Grafana, although I am not sure if it is the same root cause
Yes, should be constant. Not necessarily zero.
Can examine by checking localhost:1222
. e.g., stream_mview_input_row_count
for a dropped actor