tracer: too much memory consumption
(pprof) top
Showing nodes accounting for 106.21MB, 88.51% of 120MB total
Dropped 40 nodes (cum <= 0.60MB)
Showing top 10 nodes out of 66
flat flat% sum% cum cum%
21.50MB 17.92% 17.92% 21.50MB 17.92% gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setMeta (inline)
19.50MB 16.25% 34.18% 21.50MB 17.92% runtime/pprof.WithLabels
15MB 12.50% 46.68% 15MB 12.50% gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setMetric
12MB 10.00% 56.68% 77.79MB 64.82% gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).StartSpan
10.36MB 8.64% 65.32% 10.86MB 9.05% github.com/gamezop/***/pkg/***
8MB 6.67% 71.98% 18.86MB 15.72% github.com/gamezop/***/pkg/***
6.01MB 5.01% 76.99% 6.01MB 5.01% github.com/DataDog/datadog-go/v5/statsd.newStatsdBuffer
5.82MB 4.85% 81.84% 5.82MB 4.85% github.com/gamezop/***/pkg/***
4.50MB 3.75% 85.59% 5.27MB 4.39% gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newSpanContext
3.50MB 2.92% 88.51% 3.50MB 2.92% runtime.allocm
datadog traces is consuming more memory than application itself.
relevant code in question
txn, err := db.Begin()
if errr.IfErrAndLog(err, l, "failed to create transaction") {
return err
}
stmt, err := txn.Prepare(pq.CopyIn("bid", bidFields()...))
if errr.IfErrAndLog(err, l, "failed to prepare stmt") {
return err
}
for _, bid := range bids {
_, err = stmt.ExecContext(ctx,
bid.BidOrderId,
bid.IsBuy,
IntToDecimal(bid.Price),
bid.Qty,
bid.QuestionId,
"unmatched",
)
if errr.IfErrAndLog(err, l, "failed to exec context") {
return err
}
}
_, err = stmt.Exec()
if errr.IfErrAndLog(err, l, "failed to exec to push all data") {
return err
}
err = stmt.Close()
if errr.IfErrAndLog(err, l, "failed to close stmt") {
return err
}
err = txn.Commit()
if errr.IfErrAndLog(err, l, "failed to commit") {
return err
}
return err
Hi, I also noticed the same thing with a SQL transaction of 9m row, the traces were using 600MB + of ram (it was still increasing but my server ran out of memory), The only way the transaction worked was the remove the sqltrace. is there any way to send partial spans to help with that problem? or something else?
@prithvihv Sorry for the late reply and the issue you're experiencing.
Would it be possible for you (or @eleboucher) to share the pprof file for this? You can send it here https://help.datadoghq.com/hc/en-us/requests/new - mention my name and this issue, and then reply here?
My first guess is that this could be a goroutine leak. So a goroutine profile would also be useful.
19.50MB 16.25% 34.18% 21.50MB 17.92% runtime/pprof.WithLabels
A potential workaround you could try to get rid of this one is:
tracer.Start(
// your options ...,
tracer.WithProfilerCodeHotspots(false),
tracer.WithProfilerEndpoints(false),
)
Hey @felixge,
I sent you the requested file, you can see on my traces that sqlx.Rebind take most of the memory however when I disable dd-trace the memory is fine. I hope that can help you.
Edit: your workaround doesn't impact the memory
@eleboucher thanks, I received the data! All profiles were generated while applying the same workload?
Edit: your workaround doesn't impact the memory
Looking at your data, I'm not surprised by this. Your issue might be different from @prithvihv. There is no runtime/pprof.WithLabel in your profiles.
Anyway, I'll ping you again once I know more.
@eleboucher thanks, I received the data! All profiles were generated while applying the same workload?
Yes everything was made using the same workload
@eleboucher I followed up with our support team and they should reach out to your shortly. But the gist is that tracing long-running transactions with many queries will create a large number of spans that can't be flushed to the agent until the transaction has been completed. This will cause increased memory usage.
I've given some specific advise for your situation in the ticket (avoid sqlx.Rebind()), but the general problem is a limitation of any tracing implementation that attempts to send complete traces.
Anyway, I work on profiling. I just got interested in this because the runtime/pprof.WithLabel call in the original report is part of the profiler.
I'll leave it to the @DataDog/tracing-go team to comment/confirm.
Hello! 👋 I think @felixge has given some great pointers here. I'll just add that one area that may help here is "Partial Flushing" #1428 (TL;DR send completed spans before the entire trace is complete). This can relieve memory pressure caused by large, long running traces with many shorter spans that are currently hanging around in memory. This feature is something on our rough roadmap but not something we have a timeline for just yet. We've added this issue to our internal feature request tracking this which we'll use to prioritize working on this.
If you are still experiencing very high memory usage and would like other possible work-arounds or more direct eyes on, feel free to open a ticket with Datadog Support. Thanks!
Related to this, does the dd-trace-go support TraceInterceptor, as the dd-trace-java does?
We wan't to be able to filter/drop spans in long-lived queue jobs.
Facing a similar issue while tracing bidirectional gRPC streams. Details here: https://github.com/DataDog/dd-trace-go/issues/1428#issuecomment-1321157069
Related to this, does the
dd-trace-gosupportTraceInterceptor, as the dd-trace-java does?We wan't to be able to filter/drop spans in long-lived queue jobs. Can you open a separate github issue or reach out to datadog support for this? I think we might need some more details of your setup / use-case to better understand this ask.
Support for Partial Flushing #1428 will be started soon, once that's released let's check to see if it improves your memory usage here