dd-trace-go tracer: too much memory consumption

(pprof) top
Showing nodes accounting for 106.21MB, 88.51% of 120MB total
Dropped 40 nodes (cum <= 0.60MB)
Showing top 10 nodes out of 66
      flat  flat%   sum%        cum   cum%
   21.50MB 17.92% 17.92%    21.50MB 17.92%  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setMeta (inline)
   19.50MB 16.25% 34.18%    21.50MB 17.92%  runtime/pprof.WithLabels
      15MB 12.50% 46.68%       15MB 12.50%  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setMetric
      12MB 10.00% 56.68%    77.79MB 64.82%  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).StartSpan
   10.36MB  8.64% 65.32%    10.86MB  9.05%  github.com/gamezop/***/pkg/***
       8MB  6.67% 71.98%    18.86MB 15.72%  github.com/gamezop/***/pkg/***
    6.01MB  5.01% 76.99%     6.01MB  5.01%  github.com/DataDog/datadog-go/v5/statsd.newStatsdBuffer
    5.82MB  4.85% 81.84%     5.82MB  4.85%  github.com/gamezop/***/pkg/***
    4.50MB  3.75% 85.59%     5.27MB  4.39%  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newSpanContext
    3.50MB  2.92% 88.51%     3.50MB  2.92%  runtime.allocm

datadog traces is consuming more memory than application itself.

relevant code in question

        txn, err := db.Begin()

	if errr.IfErrAndLog(err, l, "failed to create transaction") {
		return err
	}

	stmt, err := txn.Prepare(pq.CopyIn("bid", bidFields()...))
	if errr.IfErrAndLog(err, l, "failed to prepare stmt") {
		return err
	}

	for _, bid := range bids {
		_, err = stmt.ExecContext(ctx,
			bid.BidOrderId,
			bid.IsBuy,
			IntToDecimal(bid.Price),
			bid.Qty,
			bid.QuestionId,
			"unmatched",
		)
		if errr.IfErrAndLog(err, l, "failed to exec context") {
			return err
		}
	}
	_, err = stmt.Exec()
	if errr.IfErrAndLog(err, l, "failed to exec to push all data") {
		return err
	}

	err = stmt.Close()
	if errr.IfErrAndLog(err, l, "failed to close stmt") {
		return err
	}

	err = txn.Commit()
	if errr.IfErrAndLog(err, l, "failed to commit") {
		return err
	}

	return err

Jun 27 '22 15:06 prithvihv

Hi, I also noticed the same thing with a SQL transaction of 9m row, the traces were using 600MB + of ram (it was still increasing but my server ran out of memory), The only way the transaction worked was the remove the sqltrace. is there any way to send partial spans to help with that problem? or something else?

Jul 08 '22 07:07 eleboucher

@prithvihv Sorry for the late reply and the issue you're experiencing.

Would it be possible for you (or @eleboucher) to share the pprof file for this? You can send it here https://help.datadoghq.com/hc/en-us/requests/new - mention my name and this issue, and then reply here?

My first guess is that this could be a goroutine leak. So a goroutine profile would also be useful.

19.50MB 16.25% 34.18% 21.50MB 17.92% runtime/pprof.WithLabels

A potential workaround you could try to get rid of this one is:

tracer.Start(
  // your options ...,
  tracer.WithProfilerCodeHotspots(false),
  tracer.WithProfilerEndpoints(false),
)

Jul 08 '22 17:07 felixge

Hey @felixge,

I sent you the requested file, you can see on my traces that sqlx.Rebind take most of the memory however when I disable dd-trace the memory is fine. I hope that can help you.

Edit: your workaround doesn't impact the memory

Jul 11 '22 13:07 eleboucher

@eleboucher thanks, I received the data! All profiles were generated while applying the same workload?

Edit: your workaround doesn't impact the memory

Looking at your data, I'm not surprised by this. Your issue might be different from @prithvihv. There is no runtime/pprof.WithLabel in your profiles.

Anyway, I'll ping you again once I know more.

Jul 11 '22 14:07 felixge

@eleboucher thanks, I received the data! All profiles were generated while applying the same workload?

Yes everything was made using the same workload

Jul 11 '22 14:07 eleboucher

@eleboucher I followed up with our support team and they should reach out to your shortly. But the gist is that tracing long-running transactions with many queries will create a large number of spans that can't be flushed to the agent until the transaction has been completed. This will cause increased memory usage.

I've given some specific advise for your situation in the ticket (avoid sqlx.Rebind()), but the general problem is a limitation of any tracing implementation that attempts to send complete traces.

Anyway, I work on profiling. I just got interested in this because the runtime/pprof.WithLabel call in the original report is part of the profiler.

I'll leave it to the @DataDog/tracing-go team to comment/confirm.

Jul 12 '22 08:07 felixge

Hello! 👋 I think @felixge has given some great pointers here. I'll just add that one area that may help here is "Partial Flushing" #1428 (TL;DR send completed spans before the entire trace is complete). This can relieve memory pressure caused by large, long running traces with many shorter spans that are currently hanging around in memory. This feature is something on our rough roadmap but not something we have a timeline for just yet. We've added this issue to our internal feature request tracking this which we'll use to prioritize working on this.

If you are still experiencing very high memory usage and would like other possible work-arounds or more direct eyes on, feel free to open a ticket with Datadog Support. Thanks!

Oct 04 '22 18:10 ajgajg1134

Related to this, does the dd-trace-go support TraceInterceptor, as the dd-trace-java does?

We wan't to be able to filter/drop spans in long-lived queue jobs.

Oct 20 '22 10:10 NicklasWallgren

Facing a similar issue while tracing bidirectional gRPC streams. Details here: https://github.com/DataDog/dd-trace-go/issues/1428#issuecomment-1321157069

Nov 20 '22 15:11 Abhishekvrshny

Related to this, does the dd-trace-go support TraceInterceptor, as the dd-trace-java does?

We wan't to be able to filter/drop spans in long-lived queue jobs. Can you open a separate github issue or reach out to datadog support for this? I think we might need some more details of your setup / use-case to better understand this ask.

Support for Partial Flushing #1428 will be started soon, once that's released let's check to see if it improves your memory usage here

Mar 21 '23 14:03 ajgajg1134