lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Expose Command Error Metrics via the CommandLatencyMetrics

Open phyok opened this issue 4 years ago • 6 comments

Feature Request

Is your feature request related to a problem? Please describe

N/A

Describe the solution you'd like

Would it be possible to expose the error count metrics in the Command Latency Metrics.

Describe alternatives you've considered

Alternative would be for the caller of the client to collect the error count while handling the exceptions thrown by the client.

Teachability, Documentation, Adoption, Migration Strategy

N/A

phyok avatar Feb 26 '20 00:02 phyok

Stats about failed commands aren't collected. When it gets into errors, then, what exactly qualifies as error? There are various categories such as I/O (transmission failure), enqueueing failures (command queue is full), and Redis responses indicating an error.

We collect right now latencies after the command response got decoded.

mp911de avatar Feb 27 '20 11:02 mp911de

Well granted those categories are broad, how would you recommend for a centralized configuration of Lettuce garnering them.

mikebell90 avatar Mar 03 '20 01:03 mikebell90

By which I mean, we'd love to make automatic metrics for our Lettuce users. latency, connection events are easy. command errors is currently left to the service owner, with predictable results. There are of course things like proxies (java, extrinsic) we could do, but it would be nice to have these simply emitted in some form. What is your recommendation for this @mp911de ?

mikebell90 avatar Mar 03 '20 16:03 mikebell90

Right now, the Tracing API receives callbacks regarding successful/failed commands propagating Redis error responses.

Have you tried forking the lib and playing around with real use-cases to get hold of these metrics? Probably, if we reduce the problem statement to just Redis error responses, we might come up with a simplified variant.

mp911de avatar Mar 03 '20 17:03 mp911de

Thanks, I will play around with the library.

How about grouping errors by categories:

  • Number of IO/connection errors
  • Number of errors from Redis (i.e. ERR xxxx responses from Redis)
  • Number of errors in client processing ... etc

phyok avatar Mar 04 '20 22:03 phyok

Connection errors can happen without having an active command. Upon a disconnect, we auto-reconnect: Does this count as error for a command that was sent to Redis or not?

Errors in client processing are not propagated to the command as downstream errors are invisible to commands.

mp911de avatar Mar 05 '20 20:03 mp911de