mimir
mimir copied to clipboard
Limit per-series data points per minute
Is your feature request related to a problem? Please describe.
After monitoring agents are authorised to send metrics to Mimir via push API there is no way for a Mimir admin to control the metric sampling interval in received data. In a scenario where Mimir cluster was set up with capacity estimations based on the assumption that clients will scrape their targets once a minute, but some (or all) of the clients are running with scrape intervals set (intentionally or not) to 15s Mimir cluster will come under heavier load than expected, clients may be subject to rate limiting etc.
Describe the solution you'd like
To prevent the problem described above there could be a setting either in distributor or limits config bloc to set the minimum allowed sampling intervals.
Nice to have: more setting to specify how to deal with higher than allowed sampling intervals, ie. keep only the first or last value received in a given time interval, calculate average etc.
Nice to have too: per-tenant overrides for the smallest sampling interval allowed.
Describe alternatives you've considered
n/a
Additional context
Ask me.
To prevent the problem described above there could be a setting either in distributor or limits config bloc to set the minimum allowed sampling intervals.
have you considered setting the ingestion rate limits per tenant via -distributor.ingestion-rate-limit or limits.ingestion_rate? It controls the rate of samples. The rate of samples is a function of the sampling interval (aka scrape interval, DPM) and the number of series.
Nice to have: more setting to specify how to deal with higher than allowed sampling intervals, ie. keep only the first or last value received in a given time interval, calculate average etc.
this looks a lot like https://github.com/grafana/mimir/pull/5028 and https://github.com/grafana/mimir/discussions/1834
Ingestion rate limit does not solve the problem, because ie. if distributors receive 400 samples per minute from a client it may be 400 metric streams sampled at 1 minute interval or 100 metric streams sampled at 15 second interval.
https://github.com/grafana/mimir/pull/5028 and https://github.com/grafana/mimir/discussions/1834 would solve the problem if downsampling was perform on distributors before ingestion.