zio icon indicating copy to clipboard operation
zio copied to clipboard

ZIO 2.0 Metrics: An option to dispose of a whole metric or specific time series

Open sturmin opened this issue 2 years ago • 3 comments

It would be nice to have the option to dispose of metrics or time series. I use the time series here in the same sense as Prometheus docs.

For example, a service starts many short-lived "processes", the "processes" start a bunch of time series and very quickly we have hundreds, thousands of garbage time series related to already deleted processes

It would be nice to make the service stop exposing garbage time series.

sturmin avatar Dec 21 '22 09:12 sturmin

@jdegoes this seen a quite useful feature, or?

987Nabil avatar May 22 '24 20:05 987Nabil

Agreed, seems useful. Needs careful thought and consideration.

jdegoes avatar May 22 '24 22:05 jdegoes

Hey @jdegoes @987Nabil We ran into something similar at work where we keep track of certain characteristics of a vast amount of entities (like customers and devices) that are short lived and we would like them to be cleaned up after a certain duration since they aren’t used again. This results in the metrics registry growing substantially without being able to prune with a large amount of those entities remaining entirely dormant.

A couple of people have said to use events or some other mechanism like logs since most observability platforms do not support high cardinality metrics other than honeycomb which seems to do it properly. However, metrics offer certain conveniences where if you chose events; would require the user to almost re-implement things like gauges and histograms and then find ways to convert them into an appropriate format for the end observability platform.

That’s one of the reasons why I submitted #8900 to at least provide a way for something like metrics connectors to be able to detect dormant metrics and be able to prune them out. I’m sure there’s a better way to solve this concept of disposable metrics that would almost have some kind of cache-like eviction mechanism to solve this in a better manner.

Also my 2c: The other issue which is not clear is that the current way of doing metrics doesn’t indicate that you can run into an issue with the current metrics registry as you can add as many metric tags as you like leading users like myself to maybe abuse this and think high cardinality metrics can be done without consequence.

calvinlfer avatar May 25 '24 17:05 calvinlfer