aws-xray-sdk-go
aws-xray-sdk-go copied to clipboard
Memory leak with BeginSubsegment
I'm getting memory leak issue when executing BeginSubSegment. Here is the profiling graph we got when the issue was happening.
AWS X-Ray SDK for Go: v1.7.0 Go version: go1.16.2 darwin/amd64
For sampled segments Close() function doesn't cancel context (!)
// check whether segment is dummy or not based on sampling decision
if !seg.ParentSegment.Sampled {
seg.Dummy = true
}
// create a new context to close it on segment close;
// this makes sure the closing of the segment won't affect the client's code, that uses the returned context
ctx1, cancelCtx := context.WithCancel(ctx)
seg.cancelCtx = cancelCtx
go func() {
<-ctx1.Done()
seg.handleContextDone()
}()
Fragment from close function:
if seg.Dummy {
seg.Unlock()
return
}
cancelSegCtx := seg.cancelCtx
seg.Unlock()
seg.send()
// makes sure the goroutine, that waits for possible top-level context cancellation gets closed on segment close
if cancelSegCtx != nil {
cancelSegCtx()
}
So if segment "Dummy" we have leak routine
We are seeing memory leaks in production for one of our applications. Could these dummy segments be the culprit? I see most memory usage is in Xray for the moment, still need to investigate it further. If so could we perhaps have a release with the PR with the fix?
Hi @stijndehaes, thanks for following up. We will be cutting a release as soon as able to get this fix out.
Hi @willarmiros, thank you very much! I yesterday just upgraded to the commit with the fix on main and saw immediate results, memory pressure was way down, so I can confirm it helps!
I'm seeing this too in production in v1.7.0. I create a cancellable context in the top level of my main and pass it into a message handler callback for messages received from a queue.
func main() {
ctx, cncl := context.WithCancel(context.Background())
defer cncl()
....
func createMessageCallback(ctx context.Context) func(msg *stan.Msg) {
// called for every received message
return func(msg *stan.Msg) {
var seg *xray.Segment
ctx, seg = xray.BeginSegment(ctx, "foo")
defer seg.Close(nil)
...
Looking at pprof, it's leaking a goroutine for every message I handle:
goroutine profile: total 1088
10968 @ 0x43a456 0x406d1b 0x406818 0xb01f12 0x46b621
# 0xb01f11 github.com/aws/aws-xray-sdk-go/xray.BeginSegmentWithSampling.func1+0x31 /server/vendor/github.com/aws/aws-xray-sdk-go/xray/segment.go:144
@byrnedo
As a temporary workaround in your go.mod file you can point to this commit by replacing:
require (
....
github.com/aws/aws-xray-sdk-go v1.7.0
...
)
with:
require (
....
github.com/aws/aws-xray-sdk-go v1.7.1-0.20220520211606-a735b941af42
...
)
That way you can already run the fix without having to wait for a new release
Thanks, I'll do that!
Release 1.7.1 includes the fix for the memory leak. I believe this issue can be closed