sentry-javascript icon indicating copy to clipboard operation
sentry-javascript copied to clipboard

[node][7.102.0] Suspicion of Memory Leak

Open xr0master opened this issue 1 year ago • 30 comments

Is there an existing issue for this?

  • [X] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
  • [X] I have reviewed the documentation https://docs.sentry.io/
  • [X] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

7.102.0

Framework Version

espress.js

Link to Sentry event

No response

SDK Setup

Sentry.init({
  dsn: process.env.SENTRY_DNS,
  release: 'n/a',
  environment: process.env.NODE_ENV,
  integrations: [
    new Sentry.Integrations.Http({ tracing: true }),
    new Sentry.Integrations.Express({
      app,
      methods: ['post', 'get', 'delete'],
    }),
  ],

  tracesSampleRate: 0,
});

Steps to Reproduce

  1. add app.use(Sentry.Handlers.requestHandler());
  2. run request
  3. check profile

Expected Result

n/a

Actual Result

I would like to point out that the problem is not one-to-one. That is, not every request gets stuck in memory. After dozens of requests, only one object can be added, and sometimes more. I didn't find any particular pattern.

Screenshot 2024-02-22 at 21 37 15 Screenshot 2024-02-22 at 21 37 42

You may notice that multiple hubs are created, and the data (in locals) is not removed from the response object (and the response object is not deleted either).

if remove app.use(Sentry.Handlers.requestHandler());, there are no memory issues.

xr0master avatar Feb 22 '24 14:02 xr0master

Hey @xr0master thanks for writing in!

Unfortunately this is going to be quite hard to figure out without a reproducible example. Any chance you could provide one?

Another question:

You may notice that multiple hubs are created

Are you creating multiple hubs on purpose? Or is this just something you observed with the single Sentry.init call you mentioned in the issue?

Lms24 avatar Feb 23 '24 08:02 Lms24

Hey @Lms24

Unfortunately this is going to be quite hard to figure out without a reproducible example. Any chance you could provide one?

It's definitely not easy... I'll try to do it.

Are you creating multiple hubs on purpose? Or is this just something you observed with the single Sentry.init call you mentioned in the issue?

Second option. Just regular integration with express.js. As far as I remember, only one hub should be created for the entire application, is it correct?

xr0master avatar Feb 23 '24 08:02 xr0master

@Lms24 https://github.com/xr0master/sentry-memo-leak

npm run build
npm run dev

get request http://localhost:5001/logs

Since the application is small, catching this issue is even more difficult. Dozens of requests with different intensities with several snapshots, and I managed to reproduce it. Good luck with this.

image

xr0master avatar Feb 23 '24 09:02 xr0master

@xr0master thanks for the repro and for all the details! Debugging such leaks is very hard but we'll try to get to the bottom of this. One question though: Did this only happen in the most recent version? Have you by any chance tried an older SDK version where this didn't happen before?

Lms24 avatar Feb 26 '24 09:02 Lms24

We noticed a memory leak in our production environment and am pretty sure it might be the same issue. Just dumping some info in case any of this helps.

We were on version @sentry/node 7.91.0, and upgraded to 7.104.0 during our routine package maintenance. We noticed the memory leak in production. I took 7 or so memory snapshots that i still have. When comparing the snapshots I noticed a constant increase in the Delta size, especially in strings and objects. Looking more into the string ones they all looked something like this: image Always referring to hub.js -> transaction.js like in @xr0master screenshot.

As xr0master, we are also running express and the same set up as in the config. Node 18 on elastic beanstalk. We did have the node/tracing package installed which i believe is deprecated. We did remove it, but I assume that wasn't what fixed it.

We downgraded about 2 hours ago to 7.91.0 and memory has been normalized since.

@Lms24 let me know if there is anything we can provide to assist.

Patrick-Ullrich avatar Mar 03 '24 20:03 Patrick-Ullrich

Same here: memory leak after upgrading Sentry from 7.84.0 to 7.102.0. Reverted back to 7.84 and we're OK again. Node v18.17.0, also noticed this in Node v20.11.1

daan-nu avatar Mar 04 '24 14:03 daan-nu

Echoing this, we have upgraded from 6.2.2 to 7.101.1 and are encountering severe memory leaks with similar heap snapshots as those posted above!

heipei avatar Mar 16 '24 07:03 heipei

Hey thanks a lot for everyone's input so far! Based on your replies, I think we can narrow things down a little bit and we'll try to get this prioritized as a couple of people have echoed the issue.

What we know for now:

  • Affected SDK @sentry/node
  • Seems to be connected to using the SDK in an Express app
  • Suspected version range: >@sentry/[email protected]
    • If anyone has used a more recent SDK version where the leak does not yet occur, please let us know!

Lms24 avatar Mar 18 '24 10:03 Lms24

As far as I can tell, the reproduction repository supplied by @xr0master doesn't reproduce a memory leak. I've load tested the endpoint and regardless of whether I use v7.84.0, v7.91.0 or v7.107.0, those ~6 Hub instances in the screenshots are the only instances.

If anyone can send me some heap snapshots I'd be happy to take a look at those!

timfish avatar Mar 19 '24 13:03 timfish

FWIW, we tried to downgrade to 7.91.0 and also encountered memory leaks, so my personal suspicion is that the issue was introduced before that. Again, we upgraded from 6.2.2 to 7.101.1 which would probably need bisecting to figure out where the leak was introduced.

heipei avatar Mar 19 '24 13:03 heipei

FWIW, we tried to downgrade to 7.91.0 and also encountered memory leaks, so my personal suspicion is that the issue was introduced before that. Again, we upgraded from 6.2.2 to 7.101.1 which would probably need bisecting to figure out where the leak was introduced.

That's interesting because we downgraded to 7.91.0 and it resolved our issues. Only for our node project though, next / sveltekit are both running on 7.104.0 (no memory leak for those)

Patrick-Ullrich avatar Mar 19 '24 14:03 Patrick-Ullrich

@Patrick-Ullrich would you be able to provide a reproduction example (or maybe even give us access to your project to take a look)? We were unsuccessful in reproducing with @xr0master's repo. We'd like to take this seriously because a memory leak would obviously be really bad.

lforst avatar Mar 20 '24 10:03 lforst

@lforst I'm sure any senior engineer with experience can easily reproduce the issue.

xr0master avatar Mar 20 '24 13:03 xr0master

@xr0master I guess we're not senior enough then 😄

lforst avatar Mar 20 '24 13:03 lforst

@lforst It's okay. Just work a little bit harder, and one day, you will become one of them.

In any case, I doubt that anyone will give you access to production code or a snapshot from production since this would be a big blow to the project's security.

xr0master avatar Mar 20 '24 13:03 xr0master

@xr0master

work a little bit harder, and one day, you will become one of them

doing my best!

In any case, I doubt that anyone will give you access to production code or a snapshot from production since this would be a big blow to the project's security.

It has happened before and helped tremendously. Also lets us rule out configuration issues. Having a working minimal reproduction and fixing this is in everybody's interest.

lforst avatar Mar 20 '24 14:03 lforst

@lforst servus

Sorry for the late reply, I finally had some time to look more into this. Similar to you, I was unable to reproduce the memory leak using @xr0master s repo. Or at least memory never really spiked as much as I anticipated, no matter how much load i put on the server. I ended up taking our app and stripped things down until I finally found the culprit.

I dropped a reproduction here (thx @xr0master , i just repurposed yours) with some instructions on how to monitor / reproduce it.

https://github.com/Patrick-Ullrich/sentry-mem-leak

I believe the culprit is setting tracingHandler before requestHandler. Which the docs do point out not to do.

https://docs.sentry.io/platforms/node/guides/express/:

// The request handler must be the first middleware on the app
app.use(Sentry.Handlers.requestHandler());
// Creates a mem leak
app.use(Sentry.Handlers.tracingHandler());
app.use(Sentry.Handlers.requestHandler());
// Everything is fine
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

Since we have tons of different middleware it took a while to notice that they were reversed in our main app and it never seemed to introduce an issue before.

@lforst let me know if this helps and / or if i can help in any other way.

I also went through some versions to see when it was introduced. It seems to start in 7.100.0. 7.99 and before works just fine.

Patrick-Ullrich avatar Mar 22 '24 04:03 Patrick-Ullrich

Take what I write here with a grain of salt but to me it kinda makes sense that the incorrect order causes issues.

The requestHandler creates a new async context/execution context to avoid leaking data between requests. The tracingHandler on the other hand will create a root span/transaction and set it on the current scope. Now the unfortunate interaction here is that with the incorrect order:

  1. Request comes in
  2. tracingHandler starts a span/transaction and puts it onto the current hub's scope
  3. requestHandler creates a new async context which in turn creates a new hub with a new scope
  4. The request happens within the async context and finishes
  5. requestHandler async context is exited
  6. tracingHandler ends the span/transaction.

My guess is that this works fine as long as the server load is low but if multiple simultaneous requests are going on, the scope that's active when the tracingHandler is doing its thing might loose transactions which should actually be isolated via async context. Now for some reason (probably because we're keeping references to them somewhere) the "lost" transactions aren't garbage collected.

To other folks affected by the leak in this thread: Does switching the order (i.e.) requestHandler before tracingHandler fix the issue for you?

Also worth noting that with v8 of the SDK, these handlers won't be required anymore as performance instrumentation is handled by OpenTelemetry under the hood.

Lms24 avatar Mar 22 '24 08:03 Lms24

We don't use tracingHandler. I've tried to distill our use of sentry in our main app file (Coffeescript), see below:

beforeSend = (event, hint) ->
  if _.get(hint, "originalException.skipSentry")
    logger.debug "Not sending error to sentry: #{event.exception.values[0].value}"
    return null
  isNonErrorException = _.get(event, "exception.values[0].value", "").startsWith('Non-Error exception captured') or _.get(hint.originalException, "message", "").startsWith('Non-Error exception captured')
  if isNonErrorException
    return null

  return event

# See here: https://github.com/getsentry/sentry-javascript/issues/1661
filterIntegrations = (defaultIntegrations) ->
  return defaultIntegrations.filter((x) -> not x.name.match(/onuncaught|onunhandled/i))

Sentry = require "@sentry/node"
Sentry.init({sendTimeout: 5, beforeSend: beforeSend, integrations: filterIntegrations})

process.on("unhandledRejection", (reason, p) ->
  logger.error("Unhandled Rejection at: Promise #{p}, reason: #{reason}")
  if reason.stack? and (config.nodeEnv is "development")
    logger.error reason.stack
  Sentry.captureException(reason)
)

process.on("uncaughtException", (err, origin) ->
  logger.error "Uncaught Exception, origin: #{origin}, reason: #{err.message}"
  #if reason.stack?
  #  logger.error reason.stack
  Sentry.captureException(err)
)

# Setting up the Express app
app = express()

app.use(Sentry.Handlers.requestHandler())

app.use(express.json({limit: "25mb", type: ["application/json", "application/vnd.api+json"]}))

require("./routes")(app)

app.use Sentry.Handlers.errorHandler({shouldHandleError: (err) ->
  if err.isJoi
    return false
  if err.status and err.status > 499
    return true
  return false
})

heipei avatar Mar 22 '24 09:03 heipei

I have a feeling that it's this PR because with that one onwards we started storing scopes on spans. I haven't investigated yet but we may have a bunch of hard references keeping a tree of stuff around. We should fix this somehow.

lforst avatar Mar 22 '24 09:03 lforst

Fwiw, we don't use Sentry in our express config at all, only in our Vue application and also noticed this memory leak (after upgrading 7.84.0 to 7.102.0).

We simply execute Sentry.init with some config options: ignoreErrors, environment, attachStacktrace (true), release, debug (false, unless on dev/test), tracesSampleRate set to 0, sampleRate set to a very low figure (as we have lots of traffic), allowUrls (regex with only our domain), and the Vue app itself (for the Sentry.Vue, for server side rendering / Sentry.Node we can't supply the app).

Testing is quite time-consuming, but if you have a suspect version I'm happy to help with a loadtest on our application with a before/after.

daan-nu avatar Mar 25 '24 11:03 daan-nu

@daan-nu if you could provide heap snapshots or similar would be a huge help for us to debug further.

for the Sentry.Vue, for server side rendering

I recommend you don't use @sentry/vue for SSR at all fwiw. It relies on browser APIs, and even does stacktrace parsing based expected stacktraces from a browser, so running it in a server environment is not recommended.

AbhiPrasad avatar Mar 26 '24 14:03 AbhiPrasad

We are experiencing this issue as well on production with next.js.

Adding a screenshot that shows the memory increase from version 7.95.0 -> 7.110.1. I can't provide reproduction sorry.

Snapshot from grafana memory tracker. image

sarunast avatar Apr 17 '24 07:04 sarunast

@sarunast did you check the order of our handlers as described in this comment above?

Lms24 avatar Apr 18 '24 10:04 Lms24

@Lms24 since he mentioned that this is a Next.js app any express handler shenanigans are unlikely (though not impossible).

@sarunast I'd be interested in whether you have any more involved Sentry code somewhere in your app or whether you just use what we recommend as a default. 🤔

lforst avatar Apr 18 '24 12:04 lforst

@Lms24 since he mentioned that this is a Next.js app any express handler shenanigans are unlikely (though not impossible).

@sarunast I'd be interested in whether you have any more involved Sentry code somewhere in your app or whether you just use what we recommend as a default. 🤔

As you said it's next.js so we don't configure it for express. We use very basic recommended next.js sentry setup so there is nothing much to add. There is one odd thing about the setup. We do not deploy via vercel rather our own custom docker'isd deployments to AWS. Due to this custom setup sentry works only for client and the backend reporting doesn't even work. Not sure if it helps it would be great to fix the backend reporting but I think it's not related to this issue?

sarunast avatar Apr 18 '24 12:04 sarunast

As you said it's next.js so we don't configure it for express

Apologies, I completely missed this in your first comment.

Unfortunately, I think we're talking about multiple different memory leaks and scenarios by now. The only reproduction we received so far was for the Node SDK which no longer leaks memory if the order of registering the Sentry Node/Express handlers is corrected. Furthermore, users are mentioning different version ranges when the leak started happening.

If you're able to provide a minimal reproduction for the leak you experience in your NextJS, we'd greatly appreciate it!

Lms24 avatar Apr 22 '24 11:04 Lms24

Perhaps it is @Lms24 but what caught my attention was this comment:

I also went through some versions to see when it was introduced. It seems to start in 7.100.0. 7.99 and before works just fine.

Which is true for us.

sarunast avatar Apr 24 '24 15:04 sarunast

Looking at https://github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md#71000

The related changes are:

  • fix(nextjs): Do not report redirects and notFound calls as errors in server actions (#10474)
  • fix(nextjs): Fix navigation tracing on app router (#10502)
  • fix(nextjs): Apply server action data to correct isolation scope (#10514)
  • fix(node): Use normal require call to import Undici (#10388)

What Node version is everyone using? What SDK features is everyone using (errors, performance, profiling, etc.). For Next.js in particular are you using the app or pages router?

Because of the variety of different permutations of possible setups, sharing your package.json, next.config.js, etc. would be extremely helpful to allow us to narrow this down!

AbhiPrasad avatar Apr 24 '24 16:04 AbhiPrasad

@AbhiPrasad I think it is this change where we introduce strong references to scopes on spans https://github.com/getsentry/sentry-javascript/pull/10492

lforst avatar Apr 25 '24 08:04 lforst