ioredis
ioredis copied to clipboard
Uncaught Exception in tls when using aws lambda with elastic redis cache
We are using ioredis with elastic redis cache, most of the time it works fine, however from time to time lambda functions fail with the following error:
@message 2020-08-18T13:35:07.727Z 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}
@requestId 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae
@timestamp 1597757707732
code ETIMEDOUT
errorMessage connect ETIMEDOUT
errorno ETIMEDOUT
errorType Error
stack.0 Error: connect ETIMEDOUT
stack.1 at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
stack.2 at Object.onceWrapper (events.js:421:28)
stack.3 at TLSSocket.emit (events.js:315:20)
stack.4 at TLSSocket.EventEmitter.emit (domain.js:482:12)
stack.5 at TLSSocket.Socket._onTimeout (net.js:481:8)
stack.6 at listOnTimeout (internal/timers.js:549:17)
stack.7 at processTimers (internal/timers.js:492:7)
syscall connect
main problem is that since this is an Uncaught Exception the lambda is terminated without calling to the retry strategy, neither to the reconnect in error. Here is our setting:
const redisParams = {
host: redisHost,
port: redisPort,
maxRetriesPerRequest: 4,
connectTimeout: 60000,
showFriendlyErrorStack: true,
retryStrategy(times) {
console.error('IORedis retry error', { action: 'reconneting' });
return Math.min(times * 30, 1000);
},
reconnectOnError(error) {
console.error('IORedis connection error', { error });
const targetErrors = ['READONLY', 'ETIMEDOUT'];
for (let i = 0; i < targetErrors.length; i += 1) {
const targetError = targetErrors[i];
if (error.message.includes(targetError)) {
return true;
}
}
console.error('IORedis connection error', { action: 'terminating' });
return false;
},
}
const redisClient = new IORedis(rredisParams);
redisClient.on('error', (error) => {
console.error('IORedis error', { error });
});
Notice that the 'error' event is called without problem however neither the reconnectOnError or the retryStrategy are called. Looks like the error occurs asynchronously in the TLS package.
Expected behavior: The exception must be catch and the reconnectOnError be called.
More over we enable debugging, here is an example when the error happens (Split due to to comment size limit)
| @timestamp | @message |
|---|---|
| 2020-08-17 14:43:10.383 | 2020-08-17T14:43:10.382Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting |
| 2020-08-17 14:43:10.384 | 2020-08-17T14:43:10.383Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting |
| @timestamp | @message |
|---|---|
| 2020-08-17 14:43:10.394 | 2020-08-17T14:43:10.394Z ioredis:redis queue command[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: 0 -> subscribe([ '{q}:events' ]) |
| 2020-08-17 14:44:23.059 | 2020-08-17T14:44:23.059Z ioredis:redis status[master.fa-staging-redis.mak8qi.usw2.cache.amazonaws.com:6379]: [empty] -> connecting |
| @timestamp | @message |
|---|---|
| 2020-08-17 14:44:26.863 | at TLSSocket. |
| 2020-08-17 14:44:26.863 | 2020-08-17T14:44:26.860Z ioredis:connection error: Error: connect ETIMEDOUT |
| 2020-08-17 14:44:26.865 | 2020-08-17T14:44:26.865Z c93a3655-6db4-4d44-affd-bba7f0f1c4f8 ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket. |
Same "ETIMEDOUT" happening for us on a few thousand provisioned lambdas, it seems to disconnect after about 10 minutes after being provisioned, reconnection slows down the lambda execution of about 5 seconds, so our lambdas are all stuck...
Saw this error happens randomly in lambdas also version, "ioredis": "^4.16.2" connecting to an Elasicache Redis cluster's primary node endpoint
Same issue during heroku redis addon failover. The issue happens randomly.
Same happens in our Lambda. Most of the time it works fine but time to time randomly happens connection timeout
Seeing this in our lambda that's configured in a vpc. The failure rate seems correlated to the concurrent executions...concurrency high = timeouts high. I did switch to standard node-redis and saw the same random timeouts so I'm not sure it has anything to do with ioredis, but would appreciate any report backs of anyone figuring this out.
I'm also seeing this same issue.
Same issue
Same issue here.
- Is the service client connecting to the right redis host?
- Is the redis instance reachable from the network that you are trying to access
- Is the redis connection closed properly? I have observed timeout and connection issues without proper closure from containers, Lambdas e
- If it is on Cloud, is the server firewall/SG allowing connections from the network u r trying to access.
hey guys, I am facing the same issue in v5.2.3. Is there any workaround or suggestion to switch to an other Redis client? Thanks
I'm seeing this one too occasionally, commenting here to get notified if someone happens to find out why
Does anyone ever got a fix for this?