milvus icon indicating copy to clipboard operation
milvus copied to clipboard

fix: query node may stuck at stopping progress

Open weiliu1031 opened this issue 9 months ago • 2 comments

issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica.

after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out.

weiliu1031 avatar May 16 '24 11:05 weiliu1031

Codecov Report

Attention: Patch coverage is 85.21739% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 82.15%. Comparing base (1671c78) to head (52bb75f). Report is 16 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #33104      +/-   ##
==========================================
- Coverage   82.18%   82.15%   -0.04%     
==========================================
  Files        1006     1006              
  Lines      128428   128336      -92     
==========================================
- Hits       105554   105429     -125     
- Misses      18902    18924      +22     
- Partials     3972     3983      +11     
Files Coverage Δ
internal/querycoordv2/checkers/balance_checker.go 95.12% <100.00%> (-0.18%) :arrow_down:
internal/querycoordv2/checkers/channel_checker.go 85.40% <100.00%> (+0.19%) :arrow_up:
internal/querycoordv2/checkers/index_checker.go 78.49% <100.00%> (-0.89%) :arrow_down:
internal/querycoordv2/checkers/leader_checker.go 95.23% <100.00%> (-0.09%) :arrow_down:
internal/querycoordv2/checkers/segment_checker.go 88.30% <100.00%> (-0.54%) :arrow_down:
internal/querycoordv2/job/job_load.go 90.67% <100.00%> (-0.24%) :arrow_down:
internal/querycoordv2/meta/replica.go 100.00% <100.00%> (ø)
internal/querycoordv2/meta/replica_manager.go 82.09% <100.00%> (ø)
internal/querycoordv2/meta/resource_manager.go 80.85% <100.00%> (-0.08%) :arrow_down:
...nternal/querycoordv2/observers/replica_observer.go 95.31% <100.00%> (+0.07%) :arrow_up:
... and 10 more

... and 38 files with indirect coverage changes

codecov[bot] avatar May 16 '24 13:05 codecov[bot]

/lgtm

chyezh avatar May 17 '24 06:05 chyezh

/approve

congqixia avatar May 20 '24 02:05 congqixia

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar May 20 '24 02:05 sre-ci-robot