mimir
mimir copied to clipboard
Support analyzing dpm in the cardinality API
Is your feature request related to a problem? Please describe.
There are two dimensions to ingestion in Mimir: space (ie unique series) and time (samples per minute). The current cardinality API is useful only for understanding the space dimension, but has no support for the time dimension.
Describe the solution you'd like
We should find some efficient way to count the dpm of active time series. I think it may be possible to extend the active series tracker with a dpm measurement based on two rotating buckets.
Essentially, we track two numbers for each series (openDpmBucket
and closedDpmBucket
). Each time a series is updated in the tracker, we increment the openDpmBucket
. Each time the tracker is purged (ingester.active-series-metrics-update-period
, default = 1m), we swap the values of openDpmBucket
and closedDpmBucket
, then reset openDpmBucket
to 0
.
Then we could compute an estimate of the dpm of any series via closedDpmBucket / UpdatePeriod
. This works as long as the UpdatePeriod
is greater than the actual dpm. If the actual dpm is less than one sample per UpdatePeriod
, then we may miss report the dpm as 0
.
Alternatively, we could do the same thing, but use the IdleTimeout
as the bucket window, which would give a more useful lower bound of 0.1
dpm by default, or 0.05
dpm in Grafana Cloud.
Describe alternatives you've considered
We've seen that Grafana Cloud customers resort to expensive count_over_time
queries to find the source of high dpm. One popular solution is to run the query sum by (job) (scrape_samples_scraped)
. This works great assuming data is coming from a prometheus instance, but in practice there are lots of ways time series data can find its way to Mimir, so there's still a gap for some users.
This is going to sound a bit crazy, but I think I would prefer an alternative solution which would solve the same problem:
- Limit the DPM per series
- Remove the DPM as a factor from our billing, as it is limited now we'll just limit it to whatever the customer paid for
- Now Adaptive Metrics doesn't need to worry about DPM at all anymore, as it has no impact on the bill