redis-rb
redis-rb copied to clipboard
Pub/sub being used by ActionCable always ends up in a timeout
I don't know exactly how to explain this, but ActionCable uses Redis's pub/sub scheme. I'm having an issue similar to #598 where Redis would timeout after a while and the whole Rails development server would go down.
I managed to dig through the redis-rb source code and found out that just after a consumer is subscribed in ActionCable, Redis is starting a new read
(calling Redis::Connection::SocketMixin#_read_from_socket
) that never finishes, because there's nothing left to read. I assume that it's because it's just waiting for some data to become available, however this operation times out after 50 seconds, raising a Redis::TimeoutError
, and then the code inside Redis::Client#ensure_connected
will try a reconnect.
This reconnection never succeeds, because during the reconnection it also starts the same read as before, resulting in another timeout error, starting a loop that'll end after reconnect_attempts
, crashing the server.
How can I make this work? I'm using:
Ruby 2.3.1 Rails (and ActionCable) 5.2.0 Redis 4.0.1
And in application.rb
:
config.cache_store = :redis_cache_store, { url: 'a correct redis url', reconnect_attempts: 10 }
ps: It's worth mentioning that I'm running the Rails server locally, but my redis is running on a local installation of Docker.
Have you tried to reduce the bug to a minimal repro script?
This reconnection never succeeds
At first sight I'd say some state end up corrupted in the client object, but without being able to repro myself and get a debugger like you did, it's like looking for a needle in a haystack.
It will be challenging to make a reproducible case for this, but I'll try it tomorrow.
I'll monkey patch it with @wpp's solution, for me it fails on both cases from the same cause, but with the back-off algorithm it takes longer to fail, but it's easier to see what's hapenning.
Upon further investigation I eliminated a lot of possible causes and found out that the most likely culprit is Docker itself. As I said, I'm running the redis-server on Docker+Rancher (this is what we use for production, can't change that).
I created a simple test app with the bare minimum to integrate ActionCable + Redis and tested it with a redis-server running on my local machine, and then a redis-server running on a Docker container (Docker is also running on my machine). I didn't have any issues during the first test, but the timeout happened with the second, using Docker.
I checked the Redis configuration for the Docker version and it had the Unix socket "timeout" config correctly set to 0 (which means timeout is disabled), so I can only assume it's a socket timeout on the Docker container itself.
I also tried to set the "TCP keepalive" config in Redis to "ping" the client each 30 seconds (I have timeouts after 50s of inactivity), to try and mitigate the issue, but that also didn't work.
If possible, I'd like to keep this issue open until I can find a solution to the problem, because I still think there's something that can be configured or changed within Redis to fix this.
@roooodcastro, not sure if it relates to you, but I've encountered similar problem. I've used redis like this
$redis ||= Redis.new(
:host => "redis",
:port => 6379,
:timeout => 5
)
But in one place there was a line
in_redis = Redis.new.get(CHANNEL_NAME)
I've changed it to
in_redis = Redis.new(:host => "redis", :port => 6379, :timeout => 5).get(CHANNEL_NAME)
And it worked...