rust-prometheus icon indicating copy to clipboard operation
rust-prometheus copied to clipboard

Additional Gauge methods that return a type that reverts on drop

Open nikclayton-dfinity opened this issue 4 years ago • 5 comments

Is your feature request related to a problem? Please describe.

I have an HTTP server built using Hyper.

One of the metrics I track is the number of active connections from clients. This is a gauge.

The handler function looks like (lots omitted):

async fn handle(metrics: Arc<metrics::Metrics>, request: Request<Body>) -> Result<Response<Body>, Infallible> {
    // metrics is a struct where the fields are different metrics registered with
    // a prometheus::Registry. In an Arc so a clone of it can be given to each call
    // to the handle function.

    metrics.http_server_requests_active.inc();   // Increase count of requests in flight

    // Code that processes the request and generates the response
    let response = ...

    metrics.http_server_requests_active.dec();

    Ok(response) 
}

After launching this, I discovered that sometimes the call to metrics.http_server_requests_active.dec() was not happening, resulting in bogus metrics.

It turns out that if the client disconnects from the server before the response is sent then the serving tokio task is torn down before the call to ...dec() can happen.

A fix is to do this:

struct DropSafeGauge<'a> {
    gauge: &'a IntGauge,
}

impl<'a> DropSafeGauge<'a> {
    fn new(gauge: &'a IntGauge) -> Self {
        gauge.inc();

        Self { gauge }
    }
}

impl<'a> Drop for DropSafeGauge<'a> {
    fn drop(&mut self) {
        self.gauge.dec()
    }
}

/// Then, in handle()...
... {
    let _http_server_requests_active = DropSafeGauge(&metrics.http_server_requests_active;

    // Generate the response as before
    response = ...

    // No need to decrement, it will happen when _http_server_requests_active
    // is dropped
}

This works, but having to introduce a struct and two new functions feels excessive.

Describe the solution you'd like

Companion functions to inc, dec, add, and sub that return a type that implements Drop, and when dropped, undo whatever the original action was. inc would call dec, add would call sub, etc.

Basically moving the boiler plate from my code to yours :-)

Describe alternatives you've considered

The DropSafeGauge struct above.

nikclayton-dfinity avatar Oct 20 '20 19:10 nikclayton-dfinity

I find this use-case to be very specific and thus out of scope for this library. Of course my opinion should not be considered the final one.

Instead of your suggested solution above, would it not make sense to encapsulate this logic in a hyper middleware, thus being available to all your endpoints. I am sure others would find a Prometheus hyper middleware useful as well, thus an open-source library likely finds a lot of interest.

Not directly related, instead of using a gauge to track the current connection count, I would use two counters, one increased on connection establishment and one increased on connection tear down. This way, given that Prometheus only scrapes periodically, you won't miss connection count spikes.

mxinden avatar Oct 22 '20 15:10 mxinden

I find this use-case to be very specific and thus out of scope for this library.

This is just one example. Another:

Suppose we start with a function like this.

fn do_something(metrics: Arc<metrics::Metrics>, ...) -> Result<(), SomeErrorEnum> {
    metrics.my_gauge.inc();

    // Do stuff

    metrics.my_gauge.dec();

    Ok(())
}

And all the stuff in "Do stuff" can return errors. SomeErrorEnum has a From implementation for all of those errors, so you'd like to be able to use ? for ergonomics and write:

fn do_something(metrics: Arc<metrics::Metrics>, ...) -> Result<(), SomeErrorEnum> {
    metrics.my_gauge.inc();

    do_a_thing()?;

    do_another_thing()?;

    do_something_else()?;

    metrics.my_gauge.dec();

    Ok(())
}

But you can't, because if you do that then any error will result in the gauge not decrementing.

Instead you need to write:

fn do_something(metrics: Arc<metrics::Metrics>, ...) -> Result<(), SomeErrorEnum> {
    metrics.my_gauge.inc();

    if let Err(e) = do_a_thing() {
        metrics.my_gauge.dec();
        return Err(e);
    }

    if let Err(e) = do_another_thing() {
        metrics.my_gauge.inc();
        return Err(e);

    if let Err(e) = do_something_else() {
        metrics.my_gauge.dec();
        return Err(e);
    }

    metrics.my_gauge.dec();

    Ok(())
}

which is much less ergonomic, and prone to errors (e.g., the "bug" in this example where if do_another_thing returns an error the coder has inadvertently called inc instead of dec...)

This gets worse if e.g., you're managing multiple gauges, and there are conditionals that control whether or not they get adjusted.

nikclayton-dfinity avatar Oct 25 '20 21:10 nikclayton-dfinity

@nikclayton-dfinity I share the same feelings as @mxinden here.

The underlying problem is that there is an intermixing of logic that must be always executed (final metrics decrement) and early-exit codepaths (Try on fallible results).

In general, custom Drop implementations are just trying to hide the symptoms in such cases. They may be effective in some basic scenarios, but there are still plenty of accidental codepaths where things will go wrong (e.g. a std::mem::forget which will bypass that).

In my experience, a few alternatives approaches could be:

  1. refactor the inner fallible logic and group it separately from the rest of logic that must be always executed.
  2. wrap the relevant gauges with private types of your own, with custom Drop logic. And make sure the destructors are always run, as the language does not guarantee that.
  3. add some closure-running methods to the gauges types in this library, like my_gauge.inc_dec_with(|| do_something())

You are kind of implicitly asking for option 3, but I fear that approach would be un-ergonomic to use and pretty limited in how you can use it (e.g. additional conditionals, in & out parameters, etc). The other two options are IMHO better as the consumer keeps full flexibility on the usage and logic arrangement.

lucab avatar Oct 26 '20 10:10 lucab

Maybe you can extend the gauge by yourself in this case, like:

trait GaugeExt {
    fn inc_with_restore(&self) -> IncGuard;
}

impl GaugeExt for Gauge {
    fn inc_with_restore(&self) -> IncGuard {
        self.inc()
        IncGuard(self.clone())
    }
}

breezewish avatar Oct 27 '20 11:10 breezewish

@breeswish That's pretty much what I've done in my code. My thinking in posting the issue was:

a) This felt general enough that it would be worth including in the crate instead of people needing to reinvent this wheel

b) Provide something for search engines to index so that the next person who has this problem doesn't have to spend the time that I did scratching my head over the cause.

nikclayton-dfinity avatar Oct 28 '20 09:10 nikclayton-dfinity