sentry-javascript icon indicating copy to clipboard operation
sentry-javascript copied to clipboard

Memory leak with high `tracesSampleRate`

Open raymondhechen opened this issue 4 weeks ago • 8 comments

Is there an existing issue for this?

  • [x] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
  • [x] I have reviewed the documentation https://docs.sentry.io/
  • [x] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node - express

SDK Version

10.27.0

Framework Version

Express 5.1.0

Link to Sentry event

No response

Reproduction Example/SDK Setup

No response

Steps to Reproduce

Set tracesSampleRate to 1 and observe memory usage, likely with a lot of Sentry.startSpan() usage.

Expected Result

Memory usage to not keep growing.

Actual Result

Image

Up until 10pm in the screenshot, we were using tracesSampleRate of 1. Then we switched it to 0.2. The dotted lines indicate new deployments (thus resetting memory). We've been having memory leak issues for several versions of @sentry/node now, but we didn't realize until yesterday that the biggest lever that changed this was the tracesSampleRate. The chart unfortunately doesn't show additional memory leaks, but it's consistently been there at high growth rates until now. If there any ideas on why this is happening and how Sentry is affecting this, would be greatly appreciated, as we can see memory usage still growing but just far slower now.

Additional Context

We wrap Sentry methods in our helper functions in case this affects anything:

import * as Sentry from '@sentry/node'

interface ServerSpanArgs {
  name: string // human-readable name
  op?: string // semantic operation
  attributes?: Record<string, string | number | boolean | undefined>
}

export const withServerTrace = async <T>(
  { name, op = 'service', attributes }: ServerSpanArgs,
  fn: (span: Sentry.Span) => Promise<T> | T
): Promise<T> => {
  return Sentry.startSpan({ name, op, attributes }, async (span) => {
    try {
      return await fn(span)
    } catch (err) {
      span.setStatus({ code: 2, message: 'internal_error' })
      Sentry.captureException(err)
      throw err
    }
  })
}

export const captureServerException = (
  error: unknown,
  extra?: Record<string, unknown>
) => {
  Sentry.captureException(error, { extra })
}

Priority

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it.

raymondhechen avatar Nov 26 '25 20:11 raymondhechen

JS-1224

linear[bot] avatar Nov 26 '25 20:11 linear[bot]

Hey @raymondhechen thanks for writing in! Can you share a bit more around your usage of Sentry?

  • How do you init the SDK? Can you show us some code and how/where you call it?
  • How are you using withServerTrace?
  • To me withServerTrace looks like you want the span within to be the local root span (not sure though of course) of the trace. Are you disabling auto instrumentation?

What you've shown so far already gives us some hints. An increase in tracesSampleRate could explain a memory increase very well, given that the SDK might record a lot of spans. But it's too early to say if it is actually about spans or just data getting applied more often.

Lms24 avatar Nov 27 '25 08:11 Lms24

Thanks for the response @Lms24! Here's how we init the SDK:

import * as Sentry from '@sentry/node'
import 'dotenv/config'

const DISABLED_DEFAULT_INTEGRATIONS = [
  'Amqplib',
  'Kafka',
  'OpenAI',
  'Tedious',
  'Prisma',
  'Mysql2',
  'Mysql',
  'Mongoose',
  'Mongo',
  'Fastify',
  'Hapi',
  'Koa',
  'Firebase',
  'Anthropic_AI',
]

Sentry.init({
  // debug: true,
  environment: process.env.NODE_ENV,
  dsn: process.env.SENTRY_SERVER_DSN,
  // filter out default integrations that are not needed
  integrations: (integrations) => {
    return integrations.filter(function (integration) {
      return !DISABLED_DEFAULT_INTEGRATIONS.includes(integration.name)
    })
  },
  ignoreErrors: ['AbortError'],
  tracesSampleRate: 0.2,
  sendDefaultPii: true,
})

We use withServerTrace() by just wrapping any function we want to measure the performance of. For example:

const result = await withServerTrace(
    { name: 'foo', op: 'bar' },
    async () => {
      return Promise.resolve({})
    }
  )

We really just use withServerTrace() as a helper function to wrap Sentry.startSpan() to start a span within a parent span or an express route/handler trace. I assumed it may be a memory leak issue because memory usage just keeps growing, and reducing trace sampling rate without changing anything else resulted in at least slower growth. Let me know if specific examples of our withServerTrace() could be helpful, although we really just wrap all kinds of async functions with it.

raymondhechen avatar Nov 28 '25 06:11 raymondhechen

I'm chiming in here real quick. @raymondhechen do you have any chance to test if you still have these leaks when you disable these two integrations on top: ContextLines and Context?

It could be unrelated, but I'm trying to remove possible options for the leaks.

JPeer264 avatar Dec 01 '25 09:12 JPeer264

@JPeer264 just tried disabling ContextLines and Context and it didn't seem to stop the leak unfortunately

raymondhechen avatar Dec 01 '25 23:12 raymondhechen

Ok thanks for the update that gives me a somewhat direction on where it could go. I'll try to find some time and check where it could leak.

Also to confirm, all spans which are created are sent to Sentry right?

JPeer264 avatar Dec 02 '25 09:12 JPeer264

@JPeer264 The only setting we have that may affect whether all spans are sent to Sentry (I assume) is tracesSampleRate. I'm not aware of any other flag.

However, we have noticed that in the Sentry dashboard, many of our express handler traces don't actually show all the spans that could be attributed to the trace for an endpoint. Basically, we have one webhook endpoint that automatically times out 15s (by the client that sends the request to our server). If the execution duration for that endpoint is greater than that, then we see spans getting omitted from the full trace. Unsure if related and also not a behavior we honestly expect because we still have more async execution that happens after the requesting client prematurely closes the request at 15s.

raymondhechen avatar Dec 03 '25 02:12 raymondhechen

Ok if you cancel the request after 15s proactively then I think it is alright that some events are not being sent (however, maybe a graceful cancel with a flush at the end wouldn't harm). Thanks for the hints - I'll try to investigate on this a little

JPeer264 avatar Dec 03 '25 08:12 JPeer264