client-go icon indicating copy to clipboard operation
client-go copied to clipboard

rawkv BatchGet memory leak?

Open linzai99 opened this issue 10 months ago • 2 comments

  1. client-go version: commit_id is 384feb1
  2. pprof heap

heap_pprof.zip

linzai99 avatar Feb 17 '25 02:02 linzai99

I suspect the memory leak is on the recursive stack.

https://github.com/tikv/client-go/blob/279dcd5b29a2df8797237bb09e41924d52a73fe0/rawkv/rawkv.go#L804-L814

You can add some debug log here to see if the same region error happens continuously (and from the log, you may find the root cause of this error). Or just tells us how to reproduce this leak.

you06 avatar Feb 18 '25 05:02 you06

I suspect the memory leak is on the recursive stack.

client-go/rawkv/rawkv.go

Lines 804 to 814 in 279dcd5

if regionErr != nil { err := bo.Backoff(retry.BoRegionMiss, errors.New(regionErr.String())) if err != nil { batchResp.Error = err return batchResp } resp, err = c.sendBatchReq(bo, batch.Keys, options, cmdType) batchResp.Response = resp batchResp.Error = err return batchResp } You can add some debug log here to see if the same region error happens continuously (and from the log, you may find the root cause of this error). Or just tells us how to reproduce this leak.

It's an online query service handling 50,000+ QPS. Occasionally, the memory usage of a specific instance suddenly keeps growing persistently. The only error received is err=epoch_not_match, and the goroutine count shows 2,000,000+

github.com/tikv/client-go/v2/rawkv.(*Client).sendBatchReq /data/golang/pkg/mod/github.com/tikv/client-go/[email protected]/rawkv/rawkv.go

Total: 0 97403 (flat, cum) 4.97% 722 . 97403 ??? github.com/tikv/client-go/v2/rawkv.(*Client).sendBatchReq.func1 /data/golang/pkg/mod/github.com/tikv/client-go/[email protected]/rawkv/rawkv.go

Total: 1 1877425 (flat, cum) 95.80% 735 1 1 ??? 738 . 1877424 ??? github.com/tikv/client-go/v2/rawkv.(*Client).sendBatchReq /data/golang/pkg/mod/github.com/tikv/client-go/[email protected]/rawkv/rawkv.go

Total: 0 1822308 (flat, cum) 92.99% 751 . 1822308 ??? runtime.chanrecv /usr/local/go/src/runtime/chan.go

Total: 0 1822437 (flat, cum) 92.99% 634 . . // copy data from queue to receiver 635 . . if ep != nil { 636 . . typedmemmove(c.elemtype, ep, qp) 637 . . } 638 . . // copy data from sender to queue 639 . 1822437 typedmemmove(c.elemtype, qp, sg.elem) 640 . . c.recvx++ 641 . . if c.recvx == c.dataqsiz { 642 . . c.recvx = 0 643 . . } 644 . . c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz

linzai99 avatar Jun 04 '25 07:06 linzai99