ioredis Uncaught Exception in tls when using aws lambda with elastic redis cache

We are using ioredis with elastic redis cache, most of the time it works fine, however from time to time lambda functions fail with the following error:

@message 2020-08-18T13:35:07.727Z 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}
@requestId 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae
@timestamp 1597757707732
code ETIMEDOUT
errorMessage connect ETIMEDOUT
errorno ETIMEDOUT
errorType Error
stack.0 Error: connect ETIMEDOUT
stack.1 at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
stack.2 at Object.onceWrapper (events.js:421:28)
stack.3 at TLSSocket.emit (events.js:315:20)
stack.4 at TLSSocket.EventEmitter.emit (domain.js:482:12)
stack.5 at TLSSocket.Socket._onTimeout (net.js:481:8)
stack.6 at listOnTimeout (internal/timers.js:549:17)
stack.7 at processTimers (internal/timers.js:492:7)
syscall connect

main problem is that since this is an Uncaught Exception the lambda is terminated without calling to the retry strategy, neither to the reconnect in error. Here is our setting:

const redisParams = {
  host: redisHost,
  port: redisPort,
  maxRetriesPerRequest: 4,
  connectTimeout: 60000,
  showFriendlyErrorStack: true,
  retryStrategy(times) {
    console.error('IORedis retry error', { action: 'reconneting' });
    return Math.min(times * 30, 1000);
  },
  reconnectOnError(error) {
    console.error('IORedis connection error', { error });
    const targetErrors = ['READONLY', 'ETIMEDOUT'];
    for (let i = 0; i < targetErrors.length; i += 1) {
      const targetError = targetErrors[i];
      if (error.message.includes(targetError)) {
        return true;
      }
    }
    console.error('IORedis connection error', { action: 'terminating' });
    return false;
  },
}

const redisClient = new IORedis(rredisParams);
  redisClient.on('error', (error) => {
    console.error('IORedis error', { error });
  });

Notice that the 'error' event is called without problem however neither the reconnectOnError or the retryStrategy are called. Looks like the error occurs asynchronously in the TLS package.

Expected behavior: The exception must be catch and the reconnectOnError be called.

Aug 18 '20 14:08 mpalomera

More over we enable debugging, here is an example when the error happens (Split due to to comment size limit)

@timestamp	@message
2020-08-17 14:43:10.383	2020-08-17T14:43:10.382Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting
2020-08-17 14:43:10.384	2020-08-17T14:43:10.383Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting

Aug 18 '20 14:08 mpalomera

@timestamp @message

2020-08-17 14:43:10.394 2020-08-17T14:43:10.394Z ioredis:redis queue command[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: 0 -> subscribe([ '{q}:events' ])

2020-08-17 14:44:23.059 2020-08-17T14:44:23.059Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting

@timestamp	@message
2020-08-17 14:43:10.394	2020-08-17T14:43:10.394Z ioredis:redis queue command[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: 0 -> subscribe([ '{q}:events' ])
2020-08-17 14:44:23.059	2020-08-17T14:44:23.059Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting

Aug 18 '20 15:08 mpalomera

@timestamp @message

2020-08-17 14:44:26.863 at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)

2020-08-17 14:44:26.863 2020-08-17T14:44:26.860Z ioredis:connection error: Error: connect ETIMEDOUT

2020-08-17 14:44:26.865 2020-08-17T14:44:26.865Z c93a3655-6db4-4d44-affd-bba7f0f1c4f8 ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}

@timestamp	@message
2020-08-17 14:44:26.863	at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
2020-08-17 14:44:26.863	2020-08-17T14:44:26.860Z ioredis:connection error: Error: connect ETIMEDOUT
2020-08-17 14:44:26.865	2020-08-17T14:44:26.865Z c93a3655-6db4-4d44-affd-bba7f0f1c4f8 ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}

Aug 18 '20 15:08 mpalomera

Same "ETIMEDOUT" happening for us on a few thousand provisioned lambdas, it seems to disconnect after about 10 minutes after being provisioned, reconnection slows down the lambda execution of about 5 seconds, so our lambdas are all stuck...

Aug 19 '20 13:08 yurik94

Saw this error happens randomly in lambdas also version, "ioredis": "^4.16.2" connecting to an Elasicache Redis cluster's primary node endpoint

Aug 28 '20 18:08 xuliangs

Same issue during heroku redis addon failover. The issue happens randomly.

Sep 02 '20 18:09 sandipmohod

Same happens in our Lambda. Most of the time it works fine but time to time randomly happens connection timeout

Sep 11 '20 06:09 d3s4x

Seeing this in our lambda that's configured in a vpc. The failure rate seems correlated to the concurrent executions...concurrency high = timeouts high. I did switch to standard node-redis and saw the same random timeouts so I'm not sure it has anything to do with ioredis, but would appreciate any report backs of anyone figuring this out.

Oct 13 '20 18:10 darrinholst

I'm also seeing this same issue.

Aug 12 '21 11:08 selected-pixel-jameson

Same issue

Nov 08 '21 00:11 dsandi

Same issue here.

Jan 29 '22 13:01 hilkeheremans

Is the service client connecting to the right redis host?
Is the redis instance reachable from the network that you are trying to access
Is the redis connection closed properly? I have observed timeout and connection issues without proper closure from containers, Lambdas e
If it is on Cloud, is the server firewall/SG allowing connections from the network u r trying to access.

Mar 17 '22 14:03 mukundrv

hey guys, I am facing the same issue in v5.2.3. Is there any workaround or suggestion to switch to an other Redis client? Thanks

Jan 13 '23 12:01 quyetinforma

I'm seeing this one too occasionally, commenting here to get notified if someone happens to find out why

Jun 19 '23 14:06 ojousima

Does anyone ever got a fix for this?

Nov 07 '23 12:11 evandroduarte

ioredis ioredis copied to clipboard

Uncaught Exception in tls when using aws lambda with elastic redis cache

ioredis
ioredis copied to clipboard