rust-prometheus icon indicating copy to clipboard operation
rust-prometheus copied to clipboard

Tracking peak/max value

Open kornelski opened this issue 4 years ago • 6 comments

I'd like to have a gauge that precisely tracks a peak value of another gauge (I have a gauge that goes up and down, and its temporary peak value is more interesting than current value at any time). I'm not sure what's the best way to implement it.

Currently gauges have only inc/get/set, so I have something like this:

val.inc();
if val.get() > peak.get() {
   peak.set(val);
}
// later
val.dec();

but the get+set doesn't seem elegant, and isn't atomic. Maybe you could expose fetch_max?

If inc() returned current value I could even do:

peak.max(val.inc());

kornelski avatar Oct 27 '20 10:10 kornelski

I'd like to have a gauge that precisely tracks a peak value of another gauge (I have a gauge that goes up and down, and its temporary peak value is more interesting than current value at any time).

@kornelski could you expand on your use-case a bit more? Which Prometheus queries are you planning to run on this gauge?

In case you want to ensure not missing spikes across Prometheus scrapes, try modelling your use-case with two counters instead of one gauge. E.g. for a queue instead of one gauge tracking the size of the queue, have two counters, one incremented on enqueue one incremented on dequeue.

mxinden avatar Oct 27 '20 10:10 mxinden

In my case it's number of concurrent server requests being processed. I increase a gauge when a request comes in, and decrease when it's done. The problem is that when the gauge is scraped, it's close to 0 most of the time, because requests are processed pretty quickly. But I have some very sudden traffic spikes, and I'd like to know how many requests hit my server exactly at the same time.

The solution with two counters is interesting, but I think they'd also be equal most of the time when they're scraped, so I need to add extra instrumentation that catches momentary peaks between scrapes.

kornelski avatar Oct 27 '20 10:10 kornelski

How about simply increase a counter when a request comes in? Then you can know the concurrency of the requests by using irate

breezewish avatar Oct 27 '20 12:10 breezewish

No, that gives rate at which they come, but it can't see how many of them are being actively processed in parallel.

In terms of queueing theory, I have a steady state where arrivals equal departures. I can easily measure rate of arrivals and departures, but I want to know queue length, and not typical/average/sampled length, but maximum queue length reached.

kornelski avatar Oct 27 '20 13:10 kornelski

Thanks for the details @kornelski.

I am not directly opposed to exposing some of the atomic operations to the user. I would like to suggest another alternative to the two Gauges approach though:

Say GenericGauge::inc would return the previous value. Also assume that you have a Histogram that tracks the queue length on each new arrival. In that case you can do the following on each new item arrival:

queue_length_histogram.observe(num_concurrent_requests.inc() + 1);

Depending on you bucket distribution you can get a good approximation on the max queue length by subtracting the highest accumulating bucket count with the second highest accumulating bucket count.

Compared to the two gauge approach you (a) don't have a race condition and (b) not only get the maximum queue length between scrapes, but the approximated queue length distribution across the scrape interval e.g. via quantiles.

Let me know what you think.

mxinden avatar Nov 03 '20 14:11 mxinden

I don't quite follow why subtract bucket counts. I think approximate maximum could be found by looking for a bucket that represents the highest value and has non-zero hit count.

So Histogram can work to get this information, but has a higher cost of tracking (due to counting all buckets, and counts within them), and only gives a quantized value.

kornelski avatar Nov 03 '20 21:11 kornelski