rails icon indicating copy to clipboard operation
rails copied to clipboard

ActionCable's SubscriptionAdpater for Redis does not reconnect after a connection loss.

Open wpp opened this issue 7 years ago • 21 comments

Steps to reproduce

  • Start rails server and let ActionCable establish a connection to Redis.
  • Restart/Stop the Redis instance.

Expected behavior

If the connection between ActionCable and Redis is interrupted or lost the server process tries to re-establish the connection, informs web socket connections about the disconnection (or re-establishes subscriptions) and does not abort.

Actual behavior

If the connection between ActionCable and Redis is interrupted or lost the server process is aborted, without trying to re-establish the connection.

RoomChannel is transmitting the subscription confirmation
RoomChannel is streaming from room:123
Exiting
2.3.3/gems/redis-3.3.2/lib/redis/client.rb:257:in `rescue in io': Connection lost (ECONNRESET) (Redis::ConnectionError)
  gems/redis-3.3.2/lib/redis/client.rb:250:in `io'
  gems/redis-3.3.2/lib/redis/client.rb:261:in `read'
  gems/redis-3.3.2/lib/redis/client.rb:136:in `block (3 levels) in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:135:in `loop'
  gems/redis-3.3.2/lib/redis/client.rb:135:in `block (2 levels) in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:231:in `block (2 levels) in process'
  gems/redis-3.3.2/lib/redis/client.rb:367:in `ensure_connected'
  gems/redis-3.3.2/lib/redis/client.rb:221:in `block in process'
  gems/redis-3.3.2/lib/redis/client.rb:306:in `logging'
  gems/redis-3.3.2/lib/redis/client.rb:220:in `process'
  gems/redis-3.3.2/lib/redis/client.rb:134:in `block in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:280:in `with_socket_timeout'
  gems/redis-3.3.2/lib/redis/client.rb:133:in `call_loop'
  gems/redis-3.3.2/lib/redis/subscribe.rb:43:in `subscription'
  gems/redis-3.3.2/lib/redis/subscribe.rb:12:in `subscribe'
  gems/redis-3.3.2/lib/redis.rb:2765:in `_subscription'
  gems/redis-3.3.2/lib/redis.rb:2143:in `block in subscribe'
  gems/redis-3.3.2/lib/redis.rb:58:in `block in synchronize'
  from .rubies/ruby-2.3.3/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  gems/redis-3.3.2/lib/redis.rb:58:in `synchronize'
  gems/redis-3.3.2/lib/redis.rb:2142:in `subscribe'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:75:in `block in listen'
  gems/redis-3.3.2/lib/redis/client.rb:293:in `with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:64:in `block in with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:58:in `block in synchronize'
  from .rubies/ruby-2.3.3/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  gems/redis-3.3.2/lib/redis.rb:58:in `synchronize'
  gems/redis-3.3.2/lib/redis.rb:63:in `with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:70:in `without_reconnect'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:72:in `listen'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:146:in `block in ensure_listener_running'

Additional information

As soon as the redis instance is unavailable and redis-rb receives the EOF from reading the socket the Listener aborts as intended? here.

The redis-rb client offers the option to specify reconnect_attempts during initialisation, but in case of a disconnection, after a successful one, the client doesn't attempt a reconnection. Even if it did, it might be too aggressive to allow a re-establishment since there is no delay.

A work-around is something like:

    def listen_with_retry(conn)
      listen conn
    rescue ::Redis::ConnectionError, ::Redis::CannotConnectError => e
      ActionCable.server.connections.each(&:close)
      sleep 1
      retry
    end

System configuration

Rails version: Rails 5.0.1

Ruby version: ruby 2.3.3p222

wpp avatar Jan 12 '17 18:01 wpp

cc @matthewd, @dhh, what should the expected behavior be here?

maclover7 avatar Jan 12 '17 20:01 maclover7

This issue has been automatically marked as stale because it has not been commented on for at least three months. The resources of the Rails team are limited, and so we are asking for your help. If you can still reproduce this error on the 5-1-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open. Thank you for all your contributions.

rails-bot[bot] avatar May 05 '17 00:05 rails-bot[bot]

Hello, it's been a while and I've since submitted a PR to the Redis library which might "fix" the current situation. But I'm still wondering if there is anything else I can do and would really appreciate some feedback on the desired behavior. Cheers, have a great week!

wpp avatar Oct 04 '17 09:10 wpp

Hello, it's been over a year since I submitted this issue and I haven't gotten any feedback yet. I'm wondering if there is anything else I can do and would really appreciate some feedback on the desired behavior.

wpp avatar Jan 26 '18 13:01 wpp

Reconnecting automatically sounds good to me. But sleep 1 seems crude though.

dhh avatar Jan 26 '18 16:01 dhh

How do you feel about a (exponential) back off mechanism where you'd have an option to specify the number of reconnect_attempts and the max_reconnection_timeout?

I.e the client would attempt 10 reconnections, exponentially increasing the (sleep) duration between each attempt until it reaches the specified connection limit.

If that sounds ok, I'd be happy to start work on a PR.

Cheers.

wpp avatar Jan 29 '18 10:01 wpp

We have an API for this in ActiveJob that would be good to consider extracting into something generic for timeouts/retries. See https://github.com/rails/rails/blob/master/activejob/lib/active_job/exceptions.rb#L116

On Mon, Jan 29, 2018 at 2:24 AM, Philipp Weissensteiner < [email protected]> wrote:

How do you feel about a (exponential) back off mechanism where you'd have an option to specify the number of reconnect_attempts and the max_reconnection_timeout?

I.e the client would attempt 10 reconnections, exponentially increasing the (sleep) duration between each attempt until it reaches the specified connection limit.

If that sounds ok, I'd be happy to start work on a PR.

Cheers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rails/rails/issues/27659#issuecomment-361201566, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKtV46H-cpEg1IP-vGLqnxvY92G-JTks5tPZxhgaJpZM4LiDqN .

dhh avatar Jan 30 '18 00:01 dhh

@wpp I tried your patch and there seems to be issues with subscriptions being lost (though I don't persist redis data), even if the ActionCable server manages to reconnect.

My solution was to add your patch and also to close all connections when unable to send any command to redis in the send_command method. If the clients have to reconnect, then they can resubscribe. If they stay connected, they won't know that their subscriptions are gone. I might be lost here since I'm new to ActionCable.

fooki avatar Feb 01 '18 13:02 fooki

Thanks for the info and yep that does sound familiar. The patch was a very simple first take at the problem, so I'm hoping I'll have time to submit a better solution, including David's feedback this weekend or the one after. If that would be too late for you feel free to submit a PR yourself and cc me. I'd be happy to help with testing and give my 2 cents.

wpp avatar Feb 01 '18 13:02 wpp

I am seeing similar issue which might be related to previous redis connection loss, and action cable is not doing reconnect. Issue resolved after restarting the server.

E, [2018-08-07T20:27:02.472920 #21950] ERROR -- : There was an exception - NoMethodError(undefined method `write' for nil:NilClass)
E, [2018-08-07T20:27:02.473167 #21950] ERROR -- : /shared/bundle/ruby/2.3.0/gems/redis-3.3.5/lib/redis/connection/ruby.rb:376:in `write'
/shared/bundle/ruby/2.3.0/gems/redis-3.3.5/lib/redis/client.rb:271:in `block in write'
/shared/bundle/ruby/2.3.0/gems/redis-3.3.5/lib/redis/client.rb:250:in `io'
/shared/bundle/ruby/2.3.0/gems/redis-3.3.5/lib/redis/client.rb:269:in `write'
/shared/bundle/ruby/2.3.0/gems/actioncable-5.1.4/lib/action_cable/subscription_adapter/redis.rb:161:in `send_command'

dechaoqiu avatar Aug 07 '18 21:08 dechaoqiu

Can this be a reason? https://redis.io/topics/clients#output-buffers-limits

When I send a large amount of data to a client constantly (such as 5-10 times every second), redis eventually cuts the connection between its ruby client

redis-4.1.0/lib/redis/client.rb:260:in `rescue in io': Connection lost (ECONNRESET) (Redis::ConnectionError)

github0013 avatar May 17 '19 06:05 github0013

I'm getting the same issue on Rails 6.0

I pushed an app to staging on Heroku and sometimes the Action Cable -> Redis connection times out and this crashes the entire application. Here are the error messages from logs.

2020-07-27T17:54:54.846056+00:00 app[web.1]: #<Thread:0x0000555c5dc5cd08@/app/vendor/bundle/ruby/2.5.0/gems/actioncable-5.2.3/lib/action_cable/subscription_adapter/redis.rb:148 run> terminated with exception (report_on_exception is true): 2020-07-27T17:54:54.850688+00:00 app[web.1]: /app/vendor/bundle/ruby/2.5.0/gems/redis-4.1.3/lib/redis/client.rb:362:in `rescue in establish_connection': Error connecting to Redis on ec2-52-70-215-224.compute-1.amazonaws.com:31729 (Redis::TimeoutError) (Redis::CannotConnectError)

Wondering how others worked around this. Is there a place in my code I could catch this error myself?

randall-coding avatar Jul 29 '20 13:07 randall-coding

Hello, i have the same issue. With AWS Elasticache redis the app crashes 5 minutes after the last connection.

rails: v6.1.1 ruby: v3.0.0 redis gem: v4.2.5 AWS Elasticache redis: v6.x and v5.0.6

I tried lowering the version of rails, ruby, redis gem but it is the same. However it works fine when set up in a docker container redis v6.0.9 and v5.0.6. It may depend on the redis settings.

#<Thread:0x00007f8908028d28 /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:150 run> terminated with exception (report_on_exception is true):
/usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:275:in `rescue in io': Connection lost (ECONNRESET) (Redis::ConnectionError)
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:267:in `io'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:279:in `read'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:147:in `block (3 levels) in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:146:in `loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:146:in `block (2 levels) in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:248:in `block (2 levels) in process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:389:in `ensure_connected'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:238:in `block in process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:325:in `logging'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:237:in `process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:145:in `block in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:300:in `with_socket_timeout'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:144:in `call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/subscribe.rb:44:in `subscription'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/subscribe.rb:14:in `subscribe'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:3507:in `_subscription'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:2326:in `block in subscribe'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `mon_synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:2325:in `subscribe'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:83:in `block in listen'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:313:in `with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:75:in `block in with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `mon_synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:74:in `with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:81:in `without_reconnect'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:80:in `listen'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:154:in `block in ensure_listener_running'
/usr/local/bundle/gems/redis-4.2.5/lib/redis/connection/ruby.rb:63:in `block in _read_from_socket': Connection reset by peer (Errno::ECONNRESET)
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/connection/ruby.rb:52:in `loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/connection/ruby.rb:52:in `_read_from_socket'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/connection/ruby.rb:45:in `gets'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/connection/ruby.rb:378:in `read'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:280:in `block in read'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:268:in `io'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:279:in `read'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:147:in `block (3 levels) in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:146:in `loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:146:in `block (2 levels) in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:248:in `block (2 levels) in process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:389:in `ensure_connected'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:238:in `block in process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:325:in `logging'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:237:in `process'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:145:in `block in call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:300:in `with_socket_timeout'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:144:in `call_loop'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/subscribe.rb:44:in `subscription'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/subscribe.rb:14:in `subscribe'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:3507:in `_subscription'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:2326:in `block in subscribe'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `mon_synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:2325:in `subscribe'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:83:in `block in listen'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis/client.rb:313:in `with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:75:in `block in with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `synchronize'
	from /usr/local/lib/ruby/3.0.0/monitor.rb:202:in `mon_synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:74:in `with_reconnect'
	from /usr/local/bundle/gems/redis-4.2.5/lib/redis.rb:81:in `without_reconnect'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:80:in `listen'
	from /usr/local/bundle/gems/actioncable-6.1.1/lib/action_cable/subscription_adapter/redis.rb:154:in `block in ensure_listener_running'

arakawamoriyuki avatar Jan 20 '21 02:01 arakawamoriyuki

Out of the sudden getting this when I have a Dockerized Rails 6 using a "host.docker.internal"'s Redis.

It work for the first connection. Thereafter, it crashed and unable to recover.

mech avatar Apr 16 '21 10:04 mech

@mech I'm having the same issue. I had upgraded to the latest Rails 5- my websockets all keep trying to reconnect in a loop after it says there is a successful connection. I went nuclear and updated to Rails 6 and have the same issue... When my adapter is set to async it succeeds.

shadiramadan avatar Apr 19 '21 00:04 shadiramadan

Hey @shadiramadan, do you mean in production environment? I thought the production environment the adapter should be:

production:
  adapter: redis
  url: <%= ENV["REDIS_URL"] %>
  channel_prefix: channel_name

mech avatar Apr 19 '21 01:04 mech

@fooki ActionCable uses Redis pubsub, which does not intersect with keyspace data. What this means is that if the Redis server restarts (not "just" some kind of netsplit or other temporary connectivity issue) those subscriptions are gone, even if the ActionCable subscriptions hash is still intact.

Even if someone was to implement falloff reconnection logic, the server should absolutely close all current ActionCable Connections and refuse new ones until the subscription adapter is back online. This gives people who rely on ActionCable the opportunity to cleanly handle a known exception state instead of having the server just shit the bed without even saying "you're welcome".

Hint: How does Sidekiq handle server reconnects? That is how ActionCable should handle server reconnects.

leastbad avatar Aug 03 '21 18:08 leastbad

This is such a big vulnerability for us, that we decided to monkey patch ActionCable::SubscriptionAdapter::Redis::Listener. It works by catching the disconnect exception and restart Puma using a hot restart. This recovers full functionality, provided that Redis is available. Since our primary use case is restoring connection in case of a Redis primary/replica failover, this is a safe assumption.

Perhaps this could be of use to anyone. Note: I'm not proud of this, but it works ;)

# Increase robustness against Redis disconnects, restarts or master failover.
# With the default functionality of ActionCable, the entire server crashes.
#
# This is a monkey patch of
# https://github.com/rails/rails/blob/723375147b4110ad7260962851ca4e3a7a951b47/actioncable/lib/action_cable/subscription_adapter/redis.rb#L79
module ActionCable::SubscriptionAdapter::Redis::ListenerExtensions
  def listen(conn)
    super
  rescue StandardError => e
    Sentry.capture_exception(e)

    # sleep a random amount of time to avoid all processes restarting at once.
    # During this period, websockets will be unavailable, but the Rails process
    # stays alive
    time = 60 * rand + 5
    puts "Critical ActionCable error. Restarting in #{time}"
    sleep time
    # Restart Puma using 'hot restart'
    # https://github.com/puma/puma/blob/master/docs/restart.md#hot-restart
    Process.kill('USR2', Process.pid)
  end
end

class ActionCable::SubscriptionAdapter::Redis::Listener
  prepend ActionCable::SubscriptionAdapter::Redis::ListenerExtensions
end

rhomeister avatar Sep 22 '21 01:09 rhomeister

It doesn't happen often, but when it does happen, it's catastrophic. A Redis::ConnectionError when sending a message in Action Cable to an end user will cause an entire worker (on heroku) to go down, and the worker won't restart itself. Does anyone recommend the patch suggested by @rhomeister for this issue, or some other strategy?

kingpalethe avatar Oct 10 '21 07:10 kingpalethe

@rhomeister Where's a good place to put that code?

NameError: uninitialized constant ActionCable::SubscriptionAdapter::Redis

I can't seem to get it to work. I tried adding it to initializers/ and lib/

[13] pry(main)> ActionCable::SubscriptionAdapter.constants
=> [:Test, :Base, :SubscriberMap, :ChannelPrefix]

Overload119 avatar Apr 26 '22 04:04 Overload119

@rhomeister Where's a good place to put that code?

NameError: uninitialized constant ActionCable::SubscriptionAdapter::Redis

I can't seem to get it to work. I tried adding it to initializers/ and lib/

[13] pry(main)> ActionCable::SubscriptionAdapter.constants
=> [:Test, :Base, :SubscriberMap, :ChannelPrefix]

@Overload119 Initialising the file inside initializers/ is fine but you need to add the following line also to your monkey_patch.rb file

require 'action_cable/subscription_adapter/redis'

sniperline047 avatar Aug 02 '22 13:08 sniperline047

The fix is coming in Rails 7.1. For older versions (5+), there is a backport gem: https://github.com/anycable/action-cable-redis-backport

palkan avatar Dec 03 '22 02:12 palkan