dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

[BUG]: memory usage on dd-trace-js

Open diecgia opened this issue 6 months ago • 5 comments

Tracer Version(s)

5.53.0

Node.js Version(s)

22

Bug Report

Hi, We are experiencing high memory usage with latest dd-trace versions. We've tested from versions 5.40 to 5.53.0 and are getting an OOM. With dd-trace 5.22.0 we don't get OOM errors. The following graph shows usage when updating our tracer:

Image

Downgrading to 5.22.0 usage stabilizes again.

Related to https://github.com/DataDog/dd-trace-js/issues/5554

Reproduction Code

No response

Error Logs

No response

Tracer Config

No response

Operating System

No response

Bundling

Unsure

diecgia avatar May 28 '25 11:05 diecgia

@diecgia did you try 5.28?

This other ticket mentioned it as also being a good version: https://github.com/DataDog/dd-trace-js/issues/5690

I am facing a similar issue, I will try downgrading to 5.28 to test.

JCMais avatar May 29 '25 18:05 JCMais

Hi @JCMais, I've tried 5.28, and it seems to work, I don't get OOM errors with this version.

diecgia avatar Jun 02 '25 09:06 diecgia

We are currently trying to gather more information about each individual cases of this memory leak issue. Some of what we need can be shared publicly on GitHub, but some would require a private channel, so ideally I would recommend opening a support ticket. Please feel free to share the ticket number in this issue or send it directly to me on our public Slack so that I can expedite the escalation process.

In the support ticket, please provide the following information:

  • If the issue appeared after an upgrade, what version did the issue appear in?
    • Please be as precise as possible in the exact version when the issue first appeared. This will allow us to isolate the code change that is responsible. For example, reporting that 5.0.0 works but 5.50.0 doesn't is not as helpful as knowing that 5.1.2 works but 5.1.3 doesn't.
    • Since we had a different issue with runtime metrics in 5.41.1, please make sure to disable them with DD_RUNTIME_METRICS_ENABLED=false before any bisecting to avoid any false positive.
      • If disabling runtime metrics resolves the issue, let us know as well as that would mean the leak is there.
  • If the issue happens with all other products disabled except tracing, the issue is likely in one of our integrations. I would recommend trying to disable individual integrations to isolate the issue to one of them. Integrations can be fully disabled with for example DD_TRACE_INSTRUMENTATIONS_DISABLED=express,mysql and DD_TRACE_PLUGINS_DISABLED=express,mysql. You can find the full list of integrations enabled for the service in startup logs (which can be enabled as described below)
  • Do you have any other services that have or don't have the issue?
    • If yes, are there any obvious differences between the ones that do and the ones that don't?
  • Please provide the following if possible:
    • Your package.json
    • Startup logs, which can be outputted by starting the service with DD_TRACE_STARTUP_LOGS=true
    • [optional] Debug logs, which can be outputted by starting the service with DD_TRACE_DEBUG=true.
      • Note: this is extremely verbose, so enable this with caution, ideally in a dev or staging environment.
    • [optional] Two heap dumps, one after 1h of starting the service and another one 2h after.
      • If you can provide even more heap dumps, for example after waiting another hour and calling gc() a few time that's even better. The gc function can be exposed by starting the service with NODE_OPTIONS='--expose-gc', and it needs to be called more than once for a full GC to happen.
    • Any other information you deem relevant about your environment or the application itself.

If you know of a version that works for you and doesn't have the memory leak, please keep using it for now until we update this issue with a resolution.

Thank your for your patience and understanding as we're investigating this issue.

rochdev avatar Jun 13 '25 15:06 rochdev

Hello @rochdev, Thanks for your response, I've created the following support ticket: https://help.datadoghq.com/hc/en-us/requests/2164066

diecgia avatar Jun 17 '25 08:06 diecgia

@diecgia Thank you for the support ticket. I took a look and while I'm not sure yet exactly what the problem is, I think the information you provided helped me narrow it down to a specific code change. I'll try to add some additional config options to the library to further narrow down the issue.

rochdev avatar Jun 18 '25 22:06 rochdev