ioredis
ioredis copied to clipboard
commandTimeout has unexpected behavior for cluster
The command timeout is only set on the request to an individual Redis instance: https://github.com/luin/ioredis/blob/v5.0.6/lib/Redis.ts#L422
But, if there's a network partition, in cluster mode commands will be enqueued in the offline queue indefinitely, without ever setting the timeout: https://github.com/luin/ioredis/blob/v5.0.6/lib/cluster/index.ts
For certain workloads, this could result in a total denial of service as commands will enqueue indefinitely until the partition is resolved. I think expected behavior would be to honor the command timeout in cluster mode as well.
We just encountered this issue as well. I was under the impression that commandTimeout is set regardless of whether the command was actually sent to any node, but apperatenly not.
We need a way to timeout "offline" commands
I worked around it by turning off the offline queue and implementing retries w/ circuit breaking around my operations against the client. Was able to solve my problem but it's far from a comprehensive solution.
I came across this and have checked through the implementation
The command timeout is only set on the request to an individual Redis instance
in cluster mode commands will be enqueued in the offline queue indefinitely, without ever setting the timeout
This is not true as the implementation of cluster client is also based on the Redis
instance. The redis option with commandTimeout
will be set for each redis instances in the connection pool. Ideally the commandTimeout
is made to provide timeout on actual redis command execution. When the redis cluster isn't ready the command will just be queued up in the offlineQueue
without executing the command. Once the readyHandler
is triggered the commands queued up will be executed and since it's calling redis.sendCommand
the timeout countdown will be started in the Redis
instance.
I think this feature was designed for using redis as message queue to minimize data lost when redis is not available. It will be confusing when we just want a fast-failing call on caching. I guess we might want something like callTimeout
which setting the timeout within the command.setTimeout
function in Cluster that handles most of the cases.
In case you want a hackaround, this might work
const originalSendCommand = Cluster.prototype.sendCommand
Cluster.prototype.sendCommand = function (command, stream, args) {
command.setTimeout(1000)
return originalSendCommand.call(this, command, stream, args)
}