telemetry_metrics_prometheus_core icon indicating copy to clipboard operation
telemetry_metrics_prometheus_core copied to clipboard

No option to delete value related to specific set of tags in ETS table

Open FelonEkonom opened this issue 2 years ago • 6 comments

There is no option, to delete an existing entry in ETS table. For example, if I have a sum metric with some tags, there is no option to remove value related to a specific set of tags. Because of that, size of reports generated during scrapes can only grow, and there is no possibility to remove values, that are no longer needed from these reports.

FelonEkonom avatar Apr 06 '22 12:04 FelonEkonom

That is expected behavior for Prometheus. If you're running into size issues that would be an indication that your tags have too much cardinality.

bryannaegele avatar Apr 13 '22 15:04 bryannaegele

Let's assume, that I have a system, that has many jobs running inside it. Every job has its lifetime and I want to have a tool, that will help me aggregate some metrics about these jobs. In this case, job id would be a tag, that I would group metrics by. I think, that in systems like this, you don't want to have metrics about obsolete, ended jobs in reports generated during scrapes. That is why, I think, having the option to delete metrics related to a job, that is ending, would be a great idea. Also, in this case, the cardinality of tags does not come from bad system design, but will naturally increase with a lifetime of whole systems, as upcoming jobs will start and end.

FelonEkonom avatar Apr 19 '22 17:04 FelonEkonom

Prometheus is simply not the right tool for the requirements you're describing. Prometheus creates a timeseries for every combination of metric * attributes * attribute values and those are stored in the prometheus server for the whatever the duration of the storage is set to.

I think for the use case you're describing you would be better served with tracing where cardinality in attributes is not a concern and you can get insights on multiple operations by a common attribute+value, in your case a job id.

https://github.com/open-telemetry/opentelemetry-erlang combined with Lightstep, Honeycomb, Zipkin, Grafana, etc would better fit your requirements. If you want more help or opinions you can get a lot of help in the #opentelemetry channel in the Elixir Slack.

bryannaegele avatar Apr 19 '22 19:04 bryannaegele

It's true that prometheus is storing everything, but it still has a retention time in the server configuration. So by default after 15 days the old time series will be removed. But this reporter implementaion does not have such retention time and will keep reporting old time series on every scrape. This means that old time series which could have been removed by prometheus already keep getting updated unnecessarily. Some sort of cleanup on the reporter side would be helpful, whether it's a delete function or a retention time.

hairyhum avatar Dec 15 '22 21:12 hairyhum

Hi @bryannaegele, what do you think about a suggestion from @hairyhum?

Rados13 avatar Feb 05 '24 09:02 Rados13

I'm fine with that if someone wanted to submit a PR for an expiration setting but I am not personally adding features to this library at this time.

bryannaegele avatar Feb 05 '24 17:02 bryannaegele