dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

The Redis integration can't distinguish final errors from retried errors

Open byroot opened this issue 1 year ago • 13 comments

Ref: https://github.com/redis-rb/redis-client/issues/119

There is a bit of an ongoing problem with users of dd-trace-rb, the instrumentation of the Redis gem has a fairly different behavior whether you are using the 4.x version or 5.x version.

More detail at https://github.com/redis-rb/redis-client/issues/119#issuecomment-1829703792, but in short with the 4.x version the instrumentation is hooked above the code in charge of retrying on error, while with 5.x it's below.

Because of this users upgrading from 4.x to 5.x see a stream of errors that aren't new at all, just weren't reported, and think the 5.x version is bugged.

This is no fault of dd-trace-rb, it just use the official hook points declared by redis-client. As the maintainer of redis-client I'd like to find a solution to this, and totally willing to extend or evolve the hook points to allow dd-trace-rb and similar projects to be able to ignore retried errors.

But to help with that I'd first like to know if there is a precedent to this, is this done for other integrations? Would dd-trace-rb need to just not see errors, or need some kind of boolean telling whether the error is final, etc. As this would help me drive what the API change would look like.

@ivoanjo @TonyCTHsu please let me know your thougths.

byroot avatar Aug 01 '24 07:08 byroot

Thanks for letting us know, definitely sucks that we're breaking folks and adding more work for you :sweat_smile:

Let me sync with the other folks and we'll get back to you asap.

ivoanjo avatar Aug 01 '24 08:08 ivoanjo

definitely sucks that we're breaking folks and adding more work for you

It really isn't that bad, it's mostly confusion I think.

byroot avatar Aug 01 '24 08:08 byroot

Just an FYW that we are still looking into it; just trying to get the right people together with all the summer PTOs.

marcotc avatar Aug 07 '24 20:08 marcotc

No worries. That's much appreciated.

byroot avatar Aug 07 '24 20:08 byroot

Hey @byroot, I had a chat internally in Hughes what we came up with:

  1. We think that visibility at a level that abstracts away any automatic retries is the desirable default. We had this behavior when we instrumented version 4.x, but we made a mistake of changing that behavior for 5.X.
  2. Inspecting the middleware call site today, we think that it makes sense that it is invoked on each individual request, even on retried request, because arbitrary request modifications can be necessary on an individual request level. For observability, we are not interested in that signal, but we see the value in theory.

What we re thinking of doing is going back to the original monkey-patching for 5.x, that same that we do for 4.x.

We don't think that is necessarily required for the redis-client gem to support callbacks at the level we desire which today would translate to "one high-level Redis API call to one instrumentation signal".

We are happy to expand a conversation as well and hear any feedback.

marcotc avatar Aug 21 '24 21:08 marcotc

What we re thinking of doing is going back to the original monkey-patching for 5.x, that same that we do for 4.x.

So you wouldn't instrument raw redis-client usage at all?

byroot avatar Aug 21 '24 21:08 byroot

So you wouldn't instrument raw redis-client usage at all?

This is correct. Given we rather report Redis requests at a high-level, looking at the code in redis and redis-client, we wouldn't instrument at the redis-client level.

marcotc avatar Oct 29 '24 18:10 marcotc

Note that redis-client can be used standalone, it's not exclusively a dependency of redis.

byroot avatar Oct 29 '24 18:10 byroot

@marcotc any updates?

skcc321 avatar Apr 07 '25 12:04 skcc321

@marcotc @ivoanjo Any word on this? These errors are super-loud for us.

sdrioux avatar Jul 12 '25 00:07 sdrioux

👋 thanks for the ping, let me sync with the other folks on the team to try to get this prioritized :)

ivoanjo avatar Jul 14 '25 07:07 ivoanjo

I have an open PR that adds a predicate on errors to know whether they are final or not: https://github.com/redis-rb/redis-client/pull/255

I'd love to know if this would help dd-trace-rb

byroot avatar Sep 19 '25 12:09 byroot

We have the same loud noise in our monitor that listens to error.type:Redis* error occurrances (RedisClient::ConnectionError: EOFError and RedisClient::ConnectionError: Broken pipe to be specific - https://github.com/redis-rb/redis-client/issues/119) after upgrading from sidekiq 6.5.x to Sidekiq 7.3.x gem in our rails project. SADD LPUSH fails the first time but returns HELLO SELECT. On retry SADD LPUSH passes.

cegprakash avatar Nov 27 '25 07:11 cegprakash