aws-xray-sdk-go icon indicating copy to clipboard operation
aws-xray-sdk-go copied to clipboard

Memory leak with BeginSubsegment

Open thinhlvv opened this issue 2 years ago • 4 comments

I'm getting memory leak issue when executing BeginSubSegment. Here is the profiling graph we got when the issue was happening.

AWS X-Ray SDK for Go: v1.7.0 Go version: go1.16.2 darwin/amd64

Screenshot 2022-05-18 at 3 44 31 PM Screenshot 2022-05-18 at 3 47 08 PM

thinhlvv avatar May 18 '22 07:05 thinhlvv

For sampled segments Close() function doesn't cancel context (!)

// check whether segment is dummy or not based on sampling decision
	if !seg.ParentSegment.Sampled {
		seg.Dummy = true
	}

	// create a new context to close it on segment close;
	// this makes sure the closing of the segment won't affect the client's code, that uses the returned context
	ctx1, cancelCtx := context.WithCancel(ctx)
	seg.cancelCtx = cancelCtx
	go func() {
		<-ctx1.Done()
		seg.handleContextDone()
	}()

Fragment from close function:

	if seg.Dummy {
		seg.Unlock()
		return
	}

	cancelSegCtx := seg.cancelCtx

	seg.Unlock()
	seg.send()

	// makes sure the goroutine, that waits for possible top-level context cancellation gets closed on segment close
	if cancelSegCtx != nil {
		cancelSegCtx()
	}

So if segment "Dummy" we have leak routine

nikolaianohyn avatar May 18 '22 11:05 nikolaianohyn

We are seeing memory leaks in production for one of our applications. Could these dummy segments be the culprit? I see most memory usage is in Xray for the moment, still need to investigate it further. If so could we perhaps have a release with the PR with the fix?

stijndehaes avatar Jul 25 '22 10:07 stijndehaes

Hi @stijndehaes, thanks for following up. We will be cutting a release as soon as able to get this fix out.

willarmiros avatar Jul 26 '22 06:07 willarmiros

Hi @willarmiros, thank you very much! I yesterday just upgraded to the commit with the fix on main and saw immediate results, memory pressure was way down, so I can confirm it helps!

stijndehaes avatar Jul 26 '22 06:07 stijndehaes

I'm seeing this too in production in v1.7.0. I create a cancellable context in the top level of my main and pass it into a message handler callback for messages received from a queue.

func main() {
	ctx, cncl := context.WithCancel(context.Background())
	defer cncl()
	....

func createMessageCallback(ctx context.Context) func(msg *stan.Msg) {
        // called for every received message
	return func(msg *stan.Msg) {

		var seg *xray.Segment
		ctx, seg = xray.BeginSegment(ctx, "foo")
		defer seg.Close(nil)
		...

Looking at pprof, it's leaking a goroutine for every message I handle:

goroutine profile: total 1088
10968 @ 0x43a456 0x406d1b 0x406818 0xb01f12 0x46b621
#	0xb01f11	github.com/aws/aws-xray-sdk-go/xray.BeginSegmentWithSampling.func1+0x31	/server/vendor/github.com/aws/aws-xray-sdk-go/xray/segment.go:144

byrnedo avatar Aug 25 '22 13:08 byrnedo

@byrnedo

As a temporary workaround in your go.mod file you can point to this commit by replacing:

require (
        ....
	github.com/aws/aws-xray-sdk-go v1.7.0
        ...
)

with:

require (
        ....
	github.com/aws/aws-xray-sdk-go v1.7.1-0.20220520211606-a735b941af42
        ...
)

That way you can already run the fix without having to wait for a new release

stijndehaes avatar Aug 25 '22 14:08 stijndehaes

Thanks, I'll do that!

byrnedo avatar Aug 25 '22 14:08 byrnedo

Release 1.7.1 includes the fix for the memory leak. I believe this issue can be closed

stijndehaes avatar Oct 05 '22 12:10 stijndehaes