mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Limit per-series data points per minute

Open mac133k opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

After monitoring agents are authorised to send metrics to Mimir via push API there is no way for a Mimir admin to control the metric sampling interval in received data. In a scenario where Mimir cluster was set up with capacity estimations based on the assumption that clients will scrape their targets once a minute, but some (or all) of the clients are running with scrape intervals set (intentionally or not) to 15s Mimir cluster will come under heavier load than expected, clients may be subject to rate limiting etc.

Describe the solution you'd like

To prevent the problem described above there could be a setting either in distributor or limits config bloc to set the minimum allowed sampling intervals.

Nice to have: more setting to specify how to deal with higher than allowed sampling intervals, ie. keep only the first or last value received in a given time interval, calculate average etc.

Nice to have too: per-tenant overrides for the smallest sampling interval allowed.

Describe alternatives you've considered

n/a

Additional context

Ask me.

mac133k avatar Feb 27 '24 14:02 mac133k

To prevent the problem described above there could be a setting either in distributor or limits config bloc to set the minimum allowed sampling intervals.

have you considered setting the ingestion rate limits per tenant via -distributor.ingestion-rate-limit or limits.ingestion_rate? It controls the rate of samples. The rate of samples is a function of the sampling interval (aka scrape interval, DPM) and the number of series.

Nice to have: more setting to specify how to deal with higher than allowed sampling intervals, ie. keep only the first or last value received in a given time interval, calculate average etc.

this looks a lot like https://github.com/grafana/mimir/pull/5028 and https://github.com/grafana/mimir/discussions/1834

dimitarvdimitrov avatar Feb 28 '24 11:02 dimitarvdimitrov

Ingestion rate limit does not solve the problem, because ie. if distributors receive 400 samples per minute from a client it may be 400 metric streams sampled at 1 minute interval or 100 metric streams sampled at 15 second interval.

https://github.com/grafana/mimir/pull/5028 and https://github.com/grafana/mimir/discussions/1834 would solve the problem if downsampling was perform on distributors before ingestion.

mac133k avatar Feb 28 '24 12:02 mac133k