Dragonfly2 icon indicating copy to clipboard operation
Dragonfly2 copied to clipboard

syncProgress hangs when grpc context closes

Open NickYadance opened this issue 6 months ago • 0 comments

Bug report:

The download requests hangs occasionally with full cpu usage, due to syncProgress running into busy foo-loop when context closes.

pprof:

File: agent
Type: cpu
Time: Aug 1, 2024 at 11:53pm (CST)
Duration: 30.19s, Total samples = 59.73s (197.82%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 57010ms, 95.45% of 59730ms total
Dropped 122 nodes (cum <= 298.65ms)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
   20670ms 34.61% 34.61%    20670ms 34.61%  runtime.procyield
   12200ms 20.43% 55.03%    38720ms 64.83%  runtime.lock2
    6130ms 10.26% 65.29%    56600ms 94.76%  runtime.selectgo
    4170ms  6.98% 72.28%     6550ms 10.97%  runtime.unlock2
    3580ms  5.99% 78.27%     3580ms  5.99%  runtime.osyield
    3460ms  5.79% 84.06%     3460ms  5.79%  runtime.futex
    2590ms  4.34% 88.40%    41330ms 69.19%  runtime.sellock
    2460ms  4.12% 92.52%    59590ms 99.77%  d7y.io/dragonfly/v2/client/daemon/peer.(*fileTask).syncProgress
    1100ms  1.84% 94.36%     7670ms 12.84%  runtime.selunlock
     650ms  1.09% 95.45%      650ms  1.09%  runtime.cheaprand (inline)
(pprof) list syncProgress
Total: 59.73s
ROUTINE ======================== d7y.io/dragonfly/v2/client/daemon/peer.(*fileTask).syncProgress in /Users/root/go/pkg/mod/git.garena.com/shopee/search_recommend/engine/data-deliver/third-party/dragonfly2/[email protected]/client/daemon/peer/peertask_file.go
     2.46s     59.59s (flat, cum) 99.77% of Total
         .          .    123:func (f *fileTask) syncProgress() {
         .          .    124:   defer f.span.End()
         .          .    125:   for {
     170ms     56.78s    126:           select {
     120ms      120ms    127:           case <-f.peerTaskConductor.successCh:
         .          .    128:                   f.storeToOutput()
         .          .    129:                   return
      40ms       40ms    130:           case <-f.peerTaskConductor.failCh:
         .          .    131:                   f.span.RecordError(fmt.Errorf(f.peerTaskConductor.failedReason))
         .          .    132:                   f.sendFailProgress(f.peerTaskConductor.failedCode, f.peerTaskConductor.failedReason)
         .          .    133:                   return
     1.98s      2.48s    134:           case <-f.ctx.Done():
     150ms      170ms    135:           case piece := <-f.pieceCh:
         .          .    136:                   if piece.Finished {
         .          .    137:                           continue
         .          .    138:                   }
         .          .    139:                   pg := &FileTaskProgress{
         .          .    140:                           State: &ProgressState{

https://github.com/dragonflyoss/Dragonfly2/blob/97f21cfbf5f37f131c4f34d6f3efb0410d1447f5/client/daemon/peer/peertask_file.go#L123-L162

Expected behavior:

The syncProgress should return when grpc sctx closes.

How to reproduce it:

Environment:

  • Dragonfly version: v2.1.0-4349e27
  • OS: ubuntu
  • Kernel (e.g. uname -a):
  • Others:

NickYadance avatar Aug 02 '24 04:08 NickYadance