lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Add metrics for tracking total disconnected time and reconnection attempts

Open ggivo opened this issue 9 months ago • 1 comments

Description: Introduces two new metrics to track the total time a connection remains disconnected until it is successfully reconnected and the number of reconnection attempts. The changes include:

  • New Metrics:
    • lettuce.reconnection.inactive.duration
      • Description: Measures the time taken for a successful reconnection after a disconnection.
      • Type: Timer
    • lettuce.reconnection.attempts
      • Description: Tracks the number of reconnection attempts made during a disconnection.
      • Implementation: Counter

Impact:

  • Provides better insights into connection stability and reconnection performance.

ggivo avatar Mar 18 '25 08:03 ggivo

Considering adding a few more metrics e.g

  • endpoint.command.queue - Gauge - tracks size of inflight commands (commands written to netty queue but not yet completed)
  • endpoint.command.buffer. - Gauge - tracks size of buffered commands (when autoflush=false)
  • endpoint.disconnected.buffer - Gauge - tracks size of buffered commands during disconnect

Those are implementation-specific and relevant to DefaultEndpoint.

This raises some open questions: As of now, we have CommandLatencyRecorder responsible for gathering command latency metrics,

Do we continue with separate Metrics recorders, for example, one for ConnectionMonitoring (used by ConnectionWatchdog to track inactive connection time) and another for DefaultEndpoint (tracking the size of internal queues)... or have a single MetricsRecorder for both (ConnectionMonitoring, DefaultEndpoint)?

Do we want to enable/disable only connection-related, and endpoint-related metrics separately?

@tishun any opinion

ggivo avatar Mar 20 '25 05:03 ggivo