ioredis icon indicating copy to clipboard operation
ioredis copied to clipboard

Failed requests on all shards while only a single shard failed

Open khangly opened this issue 1 year ago • 1 comments

Our service is on AWS ECS and our Redis is AWS ElastiCache (cluster mode enabled) with 3 shards. Each shard has a primary and a replica node, so 2 nodes for each shard. At 2023-08-08 07:00:00 UTC, a primary node failed so ElastiCache initiated a fail over. While that was happening, we had multiple failed write requests span across 3 shards uniformly, although it should only happen with the failed shard. During that time, new connections are also being made to all 3 shards as well, not just the failed shard.

This is our service configuration

{
  scaleReads: "slave",
  dnsLookup: (address, callback) => callback(null, address),
  slotsRefreshTimeout: 2000,
  clusterRetryStrategy(retries: number): number {
    return Math.min(retries * 100, 3000);
  },
  redisOptions: {
    username: "default",
    password,
    enableReadyCheck: true,
    tls: {},
    enableAutoPipelining: true,
    commandTimeout: 100,
    connectTimeout: 10000,
  },
  lazyConnect: true,
}

What could be the issue with our configs? Or is this expected behavior of ioredis?

khangly avatar Aug 23 '23 08:08 khangly