redis-rb Redis::ProtocolError: Got 'Protocol error, got "\r" as reply type byte' as initial reply byte. If you're in a forking environment, such as Unicorn, you need to connect to Redis after forking.

I recently migrated from using a Redis instance to a Redis cluster in my Rails app. And to use the benefits of cluster mode in redis-rb, I changed my connection string to:

redis = Redis.new(:cluster => %W[redis://redis-url:redis-port])

But, I am getting intermittent connection issues with the error:

"Redis::ProtocolError: Got 'Protocol error, got "\r" as reply type byte' as initial reply byte. If you're in a forking environment, such as Unicorn, you need to connect to Redis after forking"

And the stacktrace is as follows:

redis-4.0.3/lib/redis/connection/hiredis.rb:60 in rescue in read redis-4.0.3/lib/redis/connection/hiredis.rb:53 in read redis-4.0.3/lib/redis/client.rb:265 in block in read redis-4.0.3/lib/redis/client.rb:253 in io redis-4.0.3/lib/redis/client.rb:264 in read redis-4.0.3/lib/redis/client.rb:123 in block in call redis-4.0.3/lib/redis/client.rb:234 in block (2 levels) in process redis-4.0.3/lib/redis/client.rb:372 in ensure_connected redis-4.0.3/lib/redis/client.rb:224 in block in process redis-4.0.3/lib/redis/client.rb:309 in logging redis-4.0.3/lib/redis/client.rb:223 in process redis-4.0.3/lib/redis/client.rb:123 in call redis-4.0.3/lib/redis/cluster.rb:215 in public_send redis-4.0.3/lib/redis/cluster.rb:215 in try_send redis-4.0.3/lib/redis/cluster.rb:151 in send_command redis-4.0.3/lib/redis/cluster.rb:72 in call redis-4.0.3/lib/redis.rb:1343 in block in sadd redis-4.0.3/lib/redis.rb:50 in block in synchronize

I can understand why this was not an issue when I was using a single Redis instance with the connection string like:

redis = Redis.new(url: "redis://redis-url:redis-port/db")

As the lib/redis/client.rb has an ensure_connected function to reconnect when any errors crop up during forking of new child processes that use the redis client. But I could not find any piece of code that handles reconnections in lib/redis/cluster.rb or lib/redis/cluster/node.rb that is used in cluster mode.

How can I fix this issue when I am connecting via cluster mode?

Jun 30 '21 04:06 Subramanian-ERS

when any errors crop up during forking of new child processes that use the redis client.

Do you disconnect in the parent before fork?

Jun 30 '21 07:06 byroot

No, I never did. redis-rb always took care of the reconnections as that did not require me handling the disconnections manually in code during forking process.

Jun 30 '21 07:06 Subramanian-ERS

Hum, we might be missing this check in the cluster client. I'd say try that.

Jun 30 '21 07:06 byroot

Yes, I think the same as well. But instead of disconnecting in the parent, I would prefer reconnecting in the child or forked processes. Can you point me to the right piece of code in cluster client, so that I can override it to include the ensure_connected function to reconnect when any errors occur?

Jun 30 '21 07:06 Subramanian-ERS

Nevermind, I see the ensure_connected in your backtrace, but it explictly raise if you re-use the connection: https://github.com/redis/redis-rb/blob/af0b66cf99cc8e9f427452f8eb9da6366dc6257c/lib/redis/client.rb#L394-L399, so I doubt it's that.

Jun 30 '21 07:06 byroot

Unless you configured your client with inherit_socket: true?

Jun 30 '21 07:06 byroot

No, @byroot. I did not configure that option. It is very strange as to why I get this issue only with Redis cluster connection string.

Jun 30 '21 07:06 Subramanian-ERS

@byroot How and where do you suggest I reconnect to Redis in my Rails application? I initialise a Redis client in one of the initializer files. Is there a Redis.reconnect that I can use?

Jul 01 '21 09:07 Subramanian-ERS

Most forking servers or job runners provide some "before_fork" and "after_fork" callbacks. That's where you should handle the disconnect and reconnect.

Jul 01 '21 15:07 byroot

@byroot Found the issue to be related with Delayed Jobs in my application. For some reason, the Redis cluster connection has issues when the Delayed Job is a long running one. One additional thing to consider is that I am using Redis namespace on top of my Redis client. Is this a known issue? If so, how can I go about fixing this?

Jul 06 '21 12:07 Subramanian-ERS

For some reason, the Redis cluster connection has issues when the Delayed Job is a long running one

Hum, doesn't ring any bell. Could be the cluster code not handling reconnection properly, but sounds weird.

I am using Redis namespace on top of my Redis client

Should have 0 impact on protocol.

Is this a known issue?

Well, even with that extra information, it's still very unclear what's going on.

Jul 06 '21 12:07 byroot

Could it be possible that the error message is incorrect? Maybe it is a data issue? And also, would this issue be fixed if we do not use hiredis?

Jul 06 '21 13:07 Subramanian-ERS

Update: @byroot I have not been able to reproduce the issue since I removed hiredis from my application. Could the problem be a combination of using Redis cluster and hiredis?

Jul 08 '21 05:07 Subramanian-ERS

I have not been able to reproduce the issue since I removed hiredis from my application. Could the problem be a combination of using Redis cluster and hiredis?

Interesting. The hiredis driver is barely maintained, so it's possible it has some fork safety issues.

Regardless all this code was just entirely re-written, so I'll close assuming it's likely solved.

Aug 17 '22 19:08 byroot

redis-rb redis-rb copied to clipboard

Redis::ProtocolError: Got 'Protocol error, got "\r" as reply type byte' as initial reply byte. If you're in a forking environment, such as Unicorn, you need to connect to Redis after forking.

redis-rb
redis-rb copied to clipboard