ioredis icon indicating copy to clipboard operation
ioredis copied to clipboard

Uncaught Exception in tls when using aws lambda with elastic redis cache

Open mpalomera opened this issue 5 years ago • 15 comments

We are using ioredis with elastic redis cache, most of the time it works fine, however from time to time lambda functions fail with the following error:

@message 2020-08-18T13:35:07.727Z 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}
@requestId 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae
@timestamp 1597757707732
code ETIMEDOUT
errorMessage connect ETIMEDOUT
errorno ETIMEDOUT
errorType Error
stack.0 Error: connect ETIMEDOUT
stack.1 at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
stack.2 at Object.onceWrapper (events.js:421:28)
stack.3 at TLSSocket.emit (events.js:315:20)
stack.4 at TLSSocket.EventEmitter.emit (domain.js:482:12)
stack.5 at TLSSocket.Socket._onTimeout (net.js:481:8)
stack.6 at listOnTimeout (internal/timers.js:549:17)
stack.7 at processTimers (internal/timers.js:492:7)
syscall connect

main problem is that since this is an Uncaught Exception the lambda is terminated without calling to the retry strategy, neither to the reconnect in error. Here is our setting:

const redisParams = {
  host: redisHost,
  port: redisPort,
  maxRetriesPerRequest: 4,
  connectTimeout: 60000,
  showFriendlyErrorStack: true,
  retryStrategy(times) {
    console.error('IORedis retry error', { action: 'reconneting' });
    return Math.min(times * 30, 1000);
  },
  reconnectOnError(error) {
    console.error('IORedis connection error', { error });
    const targetErrors = ['READONLY', 'ETIMEDOUT'];
    for (let i = 0; i < targetErrors.length; i += 1) {
      const targetError = targetErrors[i];
      if (error.message.includes(targetError)) {
        return true;
      }
    }
    console.error('IORedis connection error', { action: 'terminating' });
    return false;
  },
}

const redisClient = new IORedis(rredisParams);
  redisClient.on('error', (error) => {
    console.error('IORedis error', { error });
  });


Notice that the 'error' event is called without problem however neither the reconnectOnError or the retryStrategy are called. Looks like the error occurs asynchronously in the TLS package.

Expected behavior: The exception must be catch and the reconnectOnError be called.

mpalomera avatar Aug 18 '20 14:08 mpalomera

More over we enable debugging, here is an example when the error happens (Split due to to comment size limit)

@timestamp @message
2020-08-17 14:43:10.383 2020-08-17T14:43:10.382Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting
2020-08-17 14:43:10.384 2020-08-17T14:43:10.383Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting

mpalomera avatar Aug 18 '20 14:08 mpalomera

@timestamp @message
2020-08-17 14:43:10.394 2020-08-17T14:43:10.394Z ioredis:redis queue command[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: 0 -> subscribe([ '{q}:events' ])
2020-08-17 14:44:23.059 2020-08-17T14:44:23.059Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting

mpalomera avatar Aug 18 '20 15:08 mpalomera

@timestamp @message
2020-08-17 14:44:26.863 at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
2020-08-17 14:44:26.863 2020-08-17T14:44:26.860Z ioredis:connection error: Error: connect ETIMEDOUT
2020-08-17 14:44:26.865 2020-08-17T14:44:26.865Z c93a3655-6db4-4d44-affd-bba7f0f1c4f8 ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket. (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}

mpalomera avatar Aug 18 '20 15:08 mpalomera

Same "ETIMEDOUT" happening for us on a few thousand provisioned lambdas, it seems to disconnect after about 10 minutes after being provisioned, reconnection slows down the lambda execution of about 5 seconds, so our lambdas are all stuck...

yurik94 avatar Aug 19 '20 13:08 yurik94

Saw this error happens randomly in lambdas also version, "ioredis": "^4.16.2" connecting to an Elasicache Redis cluster's primary node endpoint

xuliangs avatar Aug 28 '20 18:08 xuliangs

Same issue during heroku redis addon failover. The issue happens randomly.

sandipmohod avatar Sep 02 '20 18:09 sandipmohod

Same happens in our Lambda. Most of the time it works fine but time to time randomly happens connection timeout

d3s4x avatar Sep 11 '20 06:09 d3s4x

Seeing this in our lambda that's configured in a vpc. The failure rate seems correlated to the concurrent executions...concurrency high = timeouts high. I did switch to standard node-redis and saw the same random timeouts so I'm not sure it has anything to do with ioredis, but would appreciate any report backs of anyone figuring this out.

darrinholst avatar Oct 13 '20 18:10 darrinholst

I'm also seeing this same issue.

selected-pixel-jameson avatar Aug 12 '21 11:08 selected-pixel-jameson

Same issue

dsandi avatar Nov 08 '21 00:11 dsandi

Same issue here.

hilkeheremans avatar Jan 29 '22 13:01 hilkeheremans

  1. Is the service client connecting to the right redis host?
  2. Is the redis instance reachable from the network that you are trying to access
  3. Is the redis connection closed properly? I have observed timeout and connection issues without proper closure from containers, Lambdas e
  4. If it is on Cloud, is the server firewall/SG allowing connections from the network u r trying to access.

mukundrv avatar Mar 17 '22 14:03 mukundrv

hey guys, I am facing the same issue in v5.2.3. Is there any workaround or suggestion to switch to an other Redis client? Thanks

quyetinforma avatar Jan 13 '23 12:01 quyetinforma

I'm seeing this one too occasionally, commenting here to get notified if someone happens to find out why

ojousima avatar Jun 19 '23 14:06 ojousima

Does anyone ever got a fix for this?

evandroduarte avatar Nov 07 '23 12:11 evandroduarte