dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

ioredis `ETIMEDOUT` during NestJS startup with dd-trace-js >= 2.x

Open jamesmeneghello opened this issue 3 years ago • 7 comments

Describe the bug

When dd-trace-js is any version from 2.0 - 2.3.1, initial redis connections will time out - delaying startup of our NestJS server by some 30 seconds. 1.7.0 works perfectly fine.

Sorry, I don't have a lot more context than that - it's a very standard nestjs / bull / ioredis setup.

NestJS startup log:

$ yarn start:dev
[8:30:14 PM] Starting compilation in watch mode...

[8:30:23 PM] Found 0 errors. Watching for file changes.

Debugger listening on ws://127.0.0.1:9229/df077827-d9e9-4153-a74e-09dcdf96e951
For help, see: https://nodejs.org/en/docs/inspector
[Nest] 24417  - 03/21/2022, 8:30:38 PM     LOG [AuthSubscriber] Registered: AuthSubscriber
[Nest] 24417  - 03/21/2022, 8:31:09 PM     LOG [DocumentSubscriber] Registered: DocumentSubscriber
[Nest] 24417  - 03/21/2022, 8:31:11 PM     LOG [xx] Registered: xx
[Nest] 24417  - 03/21/2022, 8:31:15 PM     LOG [UtilSubscriber] Registered: UtilSubscriber
[ioredis] Unhandled error event: Error: connect ETIMEDOUT
    at Socket.<anonymous> (/xx/.yarn/cache/ioredis-npm-4.28.5-74671f2fa3-a8793c3324.zip/node_modules/ioredis/built/redis/index.js:327:37)
    at /xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:65:19
    at Scope.activate (/xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:29:14)
    at Socket.bound (/xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:64:20)
    at Object.onceWrapper (node:events:513:28)
    at Socket.emit (node:events:406:35)
    at Socket.emit (node:domain:470:12)
    at Socket._onTimeout (node:net:486:8)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
[ioredis] Unhandled error event: Error: connect ETIMEDOUT
    at Socket.<anonymous> (/xx/.yarn/cache/ioredis-npm-4.28.5-74671f2fa3-a8793c3324.zip/node_modules/ioredis/built/redis/index.js:327:37)
    at /xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:65:19
    at Scope.activate (/xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:29:14)
    at Socket.bound (/xx/.yarn/unplugged/dd-trace-npm-2.3.1-526f131f56/node_modules/dd-trace/packages/dd-trace/src/scope.js:64:20)
    at Object.onceWrapper (node:events:513:28)
    at Socket.emit (node:events:406:35)
    at Socket.emit (node:domain:470:12)
    at Socket._onTimeout (node:net:486:8)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
(node:24417) [PINODEP008] PinoWarning: prettyPrint is deprecated, look at https://github.com/pinojs/pino-pretty for alternatives.
(Use `node --trace-warnings ...` to show where the warning was created)
[1647865890715] INFO (24417 on xx): Starting Nest application...
    context: "NestFactory"
[1647865890716] INFO (24417 on xx): Registered: UtilSubscriber
    context: "UtilSubscriber"
[1647865890716] INFO (24417 on xx): Registered: AuthSubscriber
    context: "AuthSubscriber"
[1647865890717] INFO (24417 on xx): Registered: xx
    context: "xx"
[1647865890717] INFO (24417 on xx): Registered: DocumentSubscriber
    context: "DocumentSubscriber"
[1647865890717] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890717] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890717] INFO (24417 on xx): PassportModule dependencies initialized
    context: "InstanceLoader"
[1647865890717] INFO (24417 on xx): HttpModule dependencies initialized
    context: "InstanceLoader"
[1647865890717] INFO (24417 on xx): DiscoveryModule dependencies initialized
    context: "InstanceLoader"
[1647865890717] INFO (24417 on xx): UtilModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): LoggerModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): TranslatorModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): EventEmitterModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890718] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): BullModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): GraphQLSchemaBuilderModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): GraphQLModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmCoreModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890719] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): MikroOrmModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): AppModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): AuthModule dependencies initialized
    context: "InstanceLoader"
[1647865890720] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890721] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890722] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890722] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890722] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890722] INFO (24417 on xx): xx dependencies initialized
    context: "InstanceLoader"
[1647865890722] INFO (24417 on xx): xx initialised: checking for trades without a close job
    context: "xx"
[1647865890722] INFO (24417 on xx): xx initialised: checking for bids without a close job
    context: "xx"
[1647865890724] INFO (24417 on xx): Nest application successfully started
    context: "NestApplication"

Redis debug log:

redis_1                | 1:M 21 Mar 2022 12:31:05.488 - DB 0: 22 keys (0 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:05.488 . 0 clients connected (0 replicas), 833352 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:10.498 - DB 0: 22 keys (0 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:10.498 . 0 clients connected (0 replicas), 833352 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:15.509 - DB 0: 22 keys (0 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:15.509 . 0 clients connected (0 replicas), 833352 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:16.252 - Accepted 172.31.0.1:44384
redis_1                | 1:M 21 Mar 2022 12:31:16.252 - Accepted 172.31.0.1:44386
redis_1                | 1:M 21 Mar 2022 12:31:20.520 - DB 0: 22 keys (0 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:20.520 . 2 clients connected (0 replicas), 875248 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:25.531 - DB 0: 22 keys (0 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:25.531 . 2 clients connected (0 replicas), 875248 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44400
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44402
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44404
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44406
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44408
redis_1                | 1:M 21 Mar 2022 12:31:29.002 - Accepted 172.31.0.1:44410
**redis_1                | 1:M 21 Mar 2022 12:31:29.004 - Client closed connection**
**redis_1                | 1:M 21 Mar 2022 12:31:29.006 - Client closed connection**
redis_1                | 1:M 21 Mar 2022 12:31:29.591 - Accepted 172.31.0.1:44422
redis_1                | 1:M 21 Mar 2022 12:31:29.591 - Accepted 172.31.0.1:44426
redis_1                | 1:M 21 Mar 2022 12:31:29.591 - Accepted 172.31.0.1:44428
redis_1                | 1:M 21 Mar 2022 12:31:29.592 - Accepted 172.31.0.1:44432
redis_1                | 1:M 21 Mar 2022 12:31:29.592 - Accepted 172.31.0.1:44434
redis_1                | 1:M 21 Mar 2022 12:31:30.371 - Accepted 172.31.0.1:44442
redis_1                | 1:M 21 Mar 2022 12:31:30.371 - Accepted 172.31.0.1:44444
redis_1                | 1:M 21 Mar 2022 12:31:30.371 - Accepted 172.31.0.1:44446
redis_1                | 1:M 21 Mar 2022 12:31:30.443 * 1 changes in 3600 seconds. Saving...
redis_1                | 1:M 21 Mar 2022 12:31:30.444 * Background saving started by pid 19
redis_1                | 19:C 21 Mar 2022 12:31:30.444 - RDB: 0 MB of memory used by copy-on-write
redis_1                | 19:C 21 Mar 2022 12:31:30.449 * DB saved on disk
redis_1                | 19:C 21 Mar 2022 12:31:30.449 * RDB: 0 MB of memory used by copy-on-write
redis_1                | 1:M 21 Mar 2022 12:31:30.544 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:30.544 . 14 clients connected (0 replicas), 1252016 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:30.544 * Background saving terminated with success
redis_1                | 1:M 21 Mar 2022 12:31:35.557 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:35.557 . 14 clients connected (0 replicas), 1333840 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:40.569 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:40.569 . 14 clients connected (0 replicas), 1292888 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:45.580 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:45.581 . 14 clients connected (0 replicas), 1292888 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:50.592 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:50.592 . 14 clients connected (0 replicas), 1292888 bytes in use
redis_1                | 1:M 21 Mar 2022 12:31:55.604 - DB 0: 25 keys (3 volatile) in 32 slots HT.
redis_1                | 1:M 21 Mar 2022 12:31:55.604 . 14 clients connected (0 replicas), 1251936 bytes in use

Environment

  • Operation system: Ubuntu 20.04.3
  • Node version: 16.5.0
  • Tracer version: 2.0 - 2.3.1
  • Agent version: n/a

jamesmeneghello avatar Mar 21 '22 12:03 jamesmeneghello

The error seems to be coming from a bound event emitter from the net plugin. We've just merged a new version of that plugin in #1913 that no longer binds the emitter, which should hopefully fix the issue. If it doesn't, at the very least it should clean up the stack trace which will help isolating the issue.

If you don't want to wait for the next release it should be enough to use DD_TRACE_NET_ENABLED=false to test that assumption.

rochdev avatar Mar 26 '22 16:03 rochdev

We've now rewritten the net plugin which was involved in the above stack trace. Can you try with 2.4.1?

rochdev avatar Mar 30 '22 21:03 rochdev

@jamesmeneghello Does the latest version of dd-trace resolve this for you?

bengl avatar Apr 11 '22 20:04 bengl

I am not sure this is related, but I see timeouts and connection resets in all network communications. 45 MINUTES after deploy they usually settle. Does not happen with 1.7.1

fruwe avatar Apr 13 '22 10:04 fruwe

@fruwe Does this happen with the latest tracer version? Also, can you open an issue with support for this?

bengl avatar Apr 14 '22 19:04 bengl

Closing this issue as the problematic plugins have been rewritten and there were no confirmations that the issue still occurs afterwards.

rochdev avatar Jul 19 '22 20:07 rochdev

@bengl

I upgraded to v3.2.0 and tried to deploy the code as 1) an API service receiving traffic using express and 2) a job processing service using bull.

The API service deployed seemingly without any issues (that is much better than before). But metrics indicated that the error ratio was 20 times higher, p50 latency increased from 75ms to 1460ms, p90~p99 increased 10x, max latency did not change.

The job service failed to deploy without an apparent reason.

fruwe avatar Sep 07 '22 06:09 fruwe

I see that @rochdev reopened this issue, seemingly related to @fruwe's comment. @fruwe can you confirm if this issue is still affecting you with a more recent version of the tracer? If not I'll close the issue out again.

tlhunter avatar Dec 18 '23 22:12 tlhunter

Thank you, @tlhunter.

I appreciate the follow-up on this issue. However, I'm no longer involved in the project and, consequently, do not have access to the environment needed to verify the current status of this issue with the newer version of the tracer.

fruwe avatar Dec 19 '23 00:12 fruwe

Alright I'll re-close this for now.

tlhunter avatar Dec 20 '23 21:12 tlhunter