lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Feature Request: Metrics -or- EventBus Event for EventTimeoutExceptions

Open ShibaBandit opened this issue 6 years ago • 7 comments

Current Version:

        <dependency>
            <groupId>io.lettuce</groupId>
            <artifactId>lettuce-core</artifactId>
            <version>5.0.4.RELEASE</version>
        </dependency>

Wanted to discuss possibly getting support for timeout events on the RedisClient's event bus or a metric for the CommandMetrics object that stores a count of timeouts (preferring the latter). I've looked for this functionality in the lettuce API but have not found it. I would be open to helping collaborate on this feature.

ShibaBandit avatar May 29 '18 20:05 ShibaBandit

Thanks for raising this issue. Right now, Lettuce commands do not timeout. The only thing that times out is the RedisFuture synchronization. We introduced with #435 global command timeouts that can be enabled.

What are you trying to achieve?

mp911de avatar May 30 '18 19:05 mp911de

Thank you for the clarification. Yes I'm using RedisFutures currently as my main interaction with redis via the lettuce library.

I'm essentially wanting to obtain statistics for command timeouts in a common way within my application. I already gather the p50...p99.9 stats and put them into a dashboard, but I was looking to also add errors/timeouts.

I could solve this application side, but thought it might be a good library feature. However, with the information you provided above it may not make sense to add something like this. I'm used to other libraries with default timeouts in their conn settings like database APIs and the datastax Cassandra driver that track timeouts automatically so that's where the idea came from.

ShibaBandit avatar May 30 '18 19:05 ShibaBandit

I also think this would be a nice addition to our metrics. We can record the outcome of the command (success/failure/reason of failure). One more detail: A command may time out and complete successfully later. This is because commands are sent immediately to the transport channel and we can't undo sending.

I'll keep this ticket open so we can elaborate the design of this feature at some point in time.

mp911de avatar May 30 '18 21:05 mp911de

Hi, curious if anyone's done work on this since the last update. I'm looking for a way to collect overall success/failure metrics for redis commands - primarily statistics for timeouts and failures, but also knowing total number of commands would be useful as well. Being able to report the types of commands would be fantastic as well, so zinterstore failures vs get failures.

I also tried implementing this on the application side, it worked but it was a bit messy and seems like something that would be beneficial to have in the library.

Perhaps metrics could be recorded directly in the command class? I'm open to work on this and help get a PR up, but I'd need a bit of direction on the best way to implement this.

gtschurwald avatar Jul 23 '18 20:07 gtschurwald

@gtschurwald Take a lot at https://github.com/lettuce-io/lettuce-core/wiki/Command-Latency-Metrics. Timeouts do not apply to commands but rather to the synchronization/blocking facade.

mp911de avatar Jul 24 '18 08:07 mp911de

Thanks for the tip and quick response. We've added tracking of all currently available latency metrics to our client. I'm going to look more into adding tracking of timeouts and try to get a prototype working, maybe via RedisFuture or the LettuceFutures class where the timeout exceptions are actually created. Still some investigating to be done there. We're just upgrading to Lettuce 5 now, it's nice to see that there's global timeout support now for the async and reactive apis as well as sync.

gtschurwald avatar Jul 25 '18 20:07 gtschurwald

I've had to put this aside for now in order to tackle some more urgent work, if someone else works on adding this that would be fantastic.

For some quick ideas on how to implement this, it seems like the method awaitOrCancel in LettuceFutures might be a good point to increment some kind of failure counter, or the TimeoutException it throws could be caught and trigger a metric increment. Alternatively, perhaps the set timeout could just be compared against the latency metric for completion time when that gets reported to determine a "timeout" event, but I'm not sure if latency metrics get reported on commands that timeout.

gtschurwald avatar Jul 26 '18 20:07 gtschurwald