redis-rb icon indicating copy to clipboard operation
redis-rb copied to clipboard

disconnect on certain types of connection errors

Open goosetav opened this issue 8 years ago • 14 comments

When certain kinds of connection errors happen, disconnect to allow reconnection on next operation.

We ran into an issue where AWS ElastiCache switched the current master into a slave but did not actually go down hard. Existing connections were left intact but all write operations would fail:

Redis::CommandError READONLY You can't write against a read only slave.

If the connection were to automatically disconnect upon this type of error, the subsequent connection would reconnect to the correct node.

One, potentially flexible solution would be to allow a callback upon certain kinds of errors so that the client code (in this case it is Sidekiq, but it could also have been our main app code) could gracefully handle the problem.

This also might be specific to how AWS has implemented clustering with ElastiCache -- I don't know enough about their implementation to comment on that.

goosetav avatar Aug 07 '15 01:08 goosetav

/cc @mperham

goosetav avatar Aug 07 '15 01:08 goosetav

:+1: Just ran into this today...

jaygen avatar Sep 10 '15 03:09 jaygen

:+1:

blakehilscher avatar Sep 10 '15 03:09 blakehilscher

IMO there might be valid reasons to not disconnect here. Users might very well catch that and decide to use it as a read-only client after that. I think this should be handled in client code.

badboy avatar Sep 10 '15 09:09 badboy

:+1: I also encountered this issue after an AWS failover today. Since their endpoint doesn't change during a failover, this seems like something redis-rb could/should handle transparently.

It looks like em-redis added a reconnect_on_error option with a similar effect; this issue on sensu suggests that it handles the AWS failover case. Maybe a similar option would be appropriate here?

islemaster avatar Sep 10 '15 18:09 islemaster

I like the idea of a reconnect_on_error option and/or a callback so client code can handle the problem gracefully. The change https://github.com/portertech/em-redis/pull/3 seems like the right config options to me

goosetav avatar Sep 10 '15 20:09 goosetav

There are several classes of errors for which triggering a connection drop would have no positive effect, and would only serve to introduce connection churn; if reconnect_on_error were to be implemented in this library, I would suggest implementing it as the ability to register a block that would be called with the error in question as an argument, allowing consumers to define the behaviour, whether or not to disconnect.

-WRONGTYPE Operation against a key holding the wrong kind of value
-NOSCRIPT No matching script. Please use EVAL.
-EXECABORT Transaction discarded because of previous errors.

yaauie avatar Sep 10 '15 20:09 yaauie

that makes a lot of sense to me @yaauie

goosetav avatar Sep 10 '15 21:09 goosetav

Implemented it in ioredis (a Node.js redis client): https://github.com/luin/ioredis#reconnect-on-error. Hope that helps.

luin avatar Sep 22 '15 02:09 luin

This gem implements a patch for elasticache https://github.com/craigmcnamara/redis-elasticache

craigmcnamara avatar Oct 10 '15 08:10 craigmcnamara

:+1:

josephecombs avatar Mar 31 '16 17:03 josephecombs

:+1:

jareks avatar Apr 05 '16 05:04 jareks

Is the following parameter related? It seems that we can handle the issue if we use version 2.8.23 or later.

https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/ParameterGroups.Redis.html

For Redis 2.8.23 the following additional parameter is supported. close-on-slave-write Default: yes If enabled, clients who attempt to write to a read-only replica will be disconnected.

Also, the above parameter is renamed close-on-replica-write since 5.0.0.

https://github.com/redis/redis-rb#reconnections

https://github.com/redis/redis-rb/blob/9fd381b18da20248a111cd9551354907c772a00a/lib/redis/client.rb#L365-L401

https://github.com/redis/redis-rb/blob/9fd381b18da20248a111cd9551354907c772a00a/lib/redis/errors.rb#L23-L41

supercaracal avatar Sep 03 '20 03:09 supercaracal

Hi,

Had an issue with my redis instances that I had a failover and I was connected to replica and got "READONLY" errors (since master become slave) and needed to reconnect to the new master.

We had the same issue with our nodejs client and we did the exact-thing wrote here: https://github.com/luin/ioredis#reconnect-on-error - when we get "READONLY" error we do reconnect.

Is it possible to consider such feature for redis-rb?

yosiat avatar Oct 25 '21 11:10 yosiat

I managed to get reconnection on "READONLY" error in the past, while doing monkey-patching to redis client method.

::Redis::Client.class_eval do
  alias_method :old_read, :read

  def read
    reply = old_read
    if reply.is_a?(Redis::CommandError) && reply.message.include?("READONLY")
      Rails.logger.info "Redis::Client#read [#{host}] got a READONLY error on reply #{reply}, throwing  Redis::ConnectionError"
      raise ::Redis::ConnectionError, "Connection lost (FAILOVER)"
    end

    reply
  end
end

@byroot Is is possible to add some kind of support for such behavior in the new version of redis client? or do you have any other suggestion for implementing such logic with monkey-patching until there is official option?

yosiat avatar Nov 05 '22 16:11 yosiat

@yosiat yes that's something I want to make easy in redis-client.

It's already easy as you can register custom errors: https://github.com/redis-rb/redis-client/blob/06eac53e2deff215e4b906f950cb081918bbb174/lib/redis_client.rb#L122-L128, but it's not public API, and then there's the question of translating these errors in redis-rb.

As for reconnecting on ReadOnly, that definitely makes sense if you use sentinel, so we should probably be doing by default (if we aren't already I need to double check).

But even if you don't use sentinel, it might make sense to treat this as a connection error and reconnect.

casperisfine avatar Nov 06 '22 21:11 casperisfine

@casperisfine thanks for looking at it and implementing it :)

We are actually not using sentinel and it's reconnecting on readonly errors helped us a lot during failovers!

yosiat avatar Nov 07 '22 09:11 yosiat

Fixed in https://github.com/redis-rb/redis-client/pull/65

byroot avatar Nov 24 '22 19:11 byroot