dd-trace-js
dd-trace-js copied to clipboard
high memory usage starting with dd-trace 4.3.0
We recently updated the dd-trace library from 4.2.0 to 4.3.0. I tried also with the latest version 4.6.0 and the issue still exists there. Our memory consumption is usually pretty stable between 220 and 300M. Once updated to version 4.3.0, it starts to spike heavily.
Actual behaviour
Steps to reproduce
Environment
- Operation system: we use the lts-alpine docker image
- Node.js version: 18.16.1 (lts-alpine)
- Tracer version: 4.3.0-4.6.0
- Agent version: 7.43.1
- Relevant library versions:
"dd-trace": "4.6.0",
"next": "13.4.9",
"react": "18.2.0",
"react-dom": "18.2.0",
"react-hook-form": "7.45.1",
"swr": "2.2.0",
"typescript": "5.1.6",
Also encountered this. Something appears to be leaking memory. Our overnight heap use is typically quite stable, as there isn't much going on, but with 4.3.0 it just kept increasing linearly. Downgrading to 4.2.0 has fixed things for the moment.
Just lost my saturday to this.. :/ was on node 18 using a lot of native fetch and 4.6.0. i downgraded to 3.18
The issue does not seem to be fixed. Upgrading from 4.2 to 4.8.1 still adds over 100MiB per pod which steadily increases.
@Averethel Is the additional memory usage resolved if you disable the fetch
plugin with tracer.use('fetch', false)
?
@rochdev yes, the memory usage still goes up with the fetch
plugin disabled. It's a slower but still a noticeable increase in memory usage
@Averethel It's possible since 4.9.0 to disable the fetch instrumentation (so not just the plugin). Can you try that so we can at least isolate the issue to the integration? This can be done with DD_TRACE_DISABLED_INSTRUMENTATIONS=fetch
.
@rochdev I upgraded to 4.12 from 3.18, and started getting the leak. Then I set the DD_TRACE_DISABLED_INSTRUMENTATIONS=fetch
flag, and it seems to have removed it.
We are on node 18 and using native fetch in all the code we own. (some sdks and things have node-fetch as a dependency, but we don't enable it for any of our code).
@rochdev any idea for a path forward here? Should I open up a new issue?
@webark Is this something you can reproduce with a snippet you could share? I can't reproduce on my end even with the integration enabled, and the instrumentation also doesn't do much, so I can't really see what could be causing the issue.
@rochdev Whats with this part?
https://github.com/DataDog/dd-trace-js/blob/820101709abe2b153f165a723daed30389aeb56b/packages/datadog-instrumentations/src/fetch.js#L21-L25
It seems like it's rewriting the arguments? I wonder if maybe this is causing some confusion in the garbage collector? Is there a reason it has to be done this way? Just a thought of what stood out after a first read through.
Also, does fetch need to keep its this scope there? Could you just run this using null instead?
I wonder if maybe this is causing some confusion in the garbage collector?
Possible, although I doubt it would be to the extent of causing a memory leak since there are no additional references to long-lived objects.
Also, does fetch need to keep its this scope there? Could you just run this using null instead?
If it's already null then it will still be null, so it's effectively the same already. And if there is a value for this
, then it should be kept.
To add to this, we upgraded to the latest version of dd-trace (4.15.0) and immediately saw a massive increase in memory usage. Downgrading to v4.2.0 seems to have fixed things.
@rochdev we attempted to upgrade to 4.18.0 from 4.2.0 again to see if this issue persists and although our non-Next.js services are fine, our one Next.js service very quickly started leaking memory, especially in the ReadableStream method as shown in the profiler screenshot below. Not sure if I should create a new issue for this?
Running Next.js version ^12.3.4 in this service.
Same here, profiling points to the leaked context (I guess headers) taking a lot of heap memory in https://github.com/DataDog/dd-trace-js/blob/c35772587a1f5e0177dd694f1e90d1c5828700a2/packages/datadog-instrumentations/src/http/client.js#L48
Node 20.9.0, dd-trace 4.20.0
@svrakitin Is this leak coming from a service that's also using Next.js? And if so, what version?
@sabrenner Not using Next.js
We are not using nextjs.
I recently upgraded to node 20 and version 4.20, and I think it fixed it? I'm gong to be doing some more tests to ensure.
👋 we encountered this same issue on Next 13, dd-trace 5 and Node 20. This is a pretty serious blocker for us to continue using dd-trace.
I have found that disabling the http
integration (using DD_TRACE_DISABLED_INSTRUMENTATIONS=http
) solved our issues. However, we of course cannot do that as we are depending on a lot of the functionality provided by this instrumentation.
I noticed that recent datadog lambda layers have upgraded to 4.x of dd-trace and got worried, if this bug would impact our lambdas.
Is this issue fixed by dd-trace v5.x?
Is there any update on that? Having the same issue with latest 5.15.0 version using Apollo server.
We upgraded to node 20 and it started working. If you are using a fetch shim (cross fetch, node fetch, etc) you might want to see if you can push those versions all the way to the latest, or if they are direct dependencies, just pull them and use all native fetch.
The issues still persist using node v20 and Apollo server
Seems to be specific to the http plugin. Disabling this sucks as it effectively knee-caps distributed tracing.
Green is when we updated NextJS and DD to 4.27.
You can see a clear change where, after the update, memory usage slowly increases over time. We're still not sure what changed recently that made it go wild.
Zooming in, between green here I tested disabling datadog tracing altogether.
Memory usage stabilized around where I would expect to be. After, over this last weekend, I ran two simultaneous tests based on the comments above: 1. re-enabled tracing and disabled the next
plugin and 2. downgraded to v3.18. The behaviour returned.
I am currently testing disabling http
and it seems like it's helping though too soon to tell.
I took some heapsnapshots and noticed a lot of persistent ServerResponses and IncomingMessages held onto by dd resources:
No obvious reason so far.
Update: disabling only the http
trace didn't help, though it seems to have slowed down the leak.