foundations icon indicating copy to clipboard operation
foundations copied to clipboard

Macro to easily create rate-limited logs

Open fisherdarling opened this issue 4 months ago • 0 comments

We added an error log line for an internal lib that we wanted to make sure was only called a few times. It could have been potentially called on a per-connection basis, so we were concerned with log volume. The initial implementation used a simple atomic counter that limited invocations of the line 1000 times. @Noah-Kennedy contributed a patch for a token-bucket based approach which limited the number of invocations that can be sent at once (a burst), but allowed for an indefinite amount (no cap).

Feature request: Can the foundations logging macros add the ability to setup a rate-limiter for a log line?

Something like:

log::warn_ratelimited!("failed to process QUIC initial packet"; "error" => e, MAX_BURST=20, RESET_PERIOD=60, TOKENS_PER_PERIOD=3);

The macro would allow one to configure how quickly the bucket fills back up and the maximum burst rate. It then expand into something like the below redacted implementation:

/// Use a token bucket to log QUIC initials `Unexpected` errors, but not too quic(k)ly, so that we
/// can get an idea of the verbosity and types of errors we see.
///
/// This token bucket gets additional tokens once per minute, and is allowed to have up to 20
/// tokens in reserve to burst above its normal rate limits if logging is slower than the maximum
/// normal rate.
///
/// Some log investigation shows log rates between 200 lines per minute down to just
/// 1 line in 24h. Hence, we choose a conservative rate of up to 3 lines per minute to strike
/// a balance between added log load and useful results. Based on existing metrics, we expect
/// these types of errors to be very rare.
///
/// The max burst threshold will allow us to potentially burst a bit above if we get a sudden spike
/// of these logs, but not for too long, and only if we already aren't logging too much.
fn log_unexpected_initial_error(e: &io::Error) {
    use foundations::telemetry::log;
    use std::sync::Once;
    use std::sync::atomic::{AtomicU64, Ordering};
    use std::cmp::min;

    const TOKEN_BUCKET_MAX_BURST: u64 = 20;
    const TOKEN_BUCKET_RESET_PERIOD: Duration = Duration::from_secs(60);
    const TOKENS_PER_PERIOD: u64 = 3;

    // give a full burst initially, just so we can see things better on startup
    static LOG_COUNT: AtomicU64 = AtomicU64::new(TOKEN_BUCKET_MAX_BURST);
    static FILLER_INIT: Once = Once::new();

    async fn bucket_filler() {
        // by default missed tick behavior is burst
        let mut interval = tokio::time::interval(TOKEN_BUCKET_RESET_PERIOD);
        loop {
            interval.tick().await;

            // Skip update if v == TOKEN_BUCKET_MAX_BURST by returning None
            let _ = LOG_COUNT.fetch_update(
                Ordering::AcqRel,
                Ordering::Relaxed,
                |v| (v < TOKEN_BUCKET_MAX_BURST)
                    .then_some(min(v + TOKENS_PER_PERIOD, TOKEN_BUCKET_MAX_BURST))
            );
        }
    }
    FILLER_INIT.call_once(|| _ = tokio::spawn(bucket_filler()));

    let token_acquired = LOG_COUNT.fetch_update(
        Ordering::AcqRel,
        Ordering::Relaxed,
        |v| (v > 0).then_some(v - 1),
    ).is_ok();
    if token_acquired {
        log::warn!("failed to process QUIC initial packet"; "error" => e);
    }
}

fisherdarling avatar Oct 09 '24 11:10 fisherdarling