milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Should ban auto balance channel

Open weiliu1031 opened this issue 2 years ago • 8 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

for now, if during balance channel. here will came two shard on same channel, we can't prevent release happens in one shard, and search happens in another shard, so search will exit with error.

we should ban auto balance channel until we can deal event which may happens in two shards.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

weiliu1031 avatar Apr 26 '23 04:04 weiliu1031

should we just release the old channel before have the new one? That should solve the problem right?

xiaofan-luan avatar Apr 26 '23 05:04 xiaofan-luan

should we just release the old channel before have the new one? That should solve the problem right?

if we release before sub channel, there will be a period with no available shard

weiliu1031 avatar Apr 26 '23 06:04 weiliu1031

should we just release the old channel before have the new one? That should solve the problem right?

if we release before sub channel, there will be a period with no available shard

Yep, but it has to be only one leader

xiaofan-luan avatar Apr 26 '23 06:04 xiaofan-luan

We could have system has two delegators at the same time:

  • Make old delegator work as sub node during the balance
  • All load/release operation dispatched by new delegator shall be forwarded by the older one
  • After the new delegator become workable, de-register the old delegator

congqixia avatar Apr 26 '23 08:04 congqixia

/assign @weiliu1031 /unassign

yanliang567 avatar Apr 26 '23 09:04 yanliang567

We could have system has two delegators at the same time:

  • Make old delegator work as sub node during the balance
  • All load/release operation dispatched by new delegator shall be forwarded by the older one
  • After the new delegator become workable, de-register the old delegator

It's might be doable, if querynode accept duplicated insert. And the msgstream has to change to shared mode I guess

xiaofan-luan avatar Apr 26 '23 17:04 xiaofan-luan

we will ban balance channel for short term, until we can deal with two shard online for same channel in replica.

weiliu1031 avatar Apr 28 '23 06:04 weiliu1031

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar May 31 '23 20:05 stale[bot]

@weiliu1031 shall we keep this open?

yanliang567 avatar Jun 09 '23 01:06 yanliang567

balance channel should be the priority for 2.3

xiaofan-luan avatar Jun 09 '23 07:06 xiaofan-luan

/reopen

weiliu1031 avatar Jun 09 '23 09:06 weiliu1031

@weiliu1031: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Jun 09 '23 09:06 sre-ci-robot

balance channel should be the priority for 2.3

@yah01 is working on supporting two shard in same channel exist at same time. after that, we will re-enable balance channel

weiliu1031 avatar Jun 09 '23 09:06 weiliu1031

/assign working on it

yah01 avatar Jun 12 '23 08:06 yah01

master fixed with #24849

yah01 avatar Jun 19 '23 03:06 yah01

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 19 '23 06:07 stale[bot]

/reopen

jiaoew1991 avatar Sep 05 '23 01:09 jiaoew1991

@jiaoew1991: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Sep 05 '23 01:09 sre-ci-robot

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Oct 05 '23 01:10 stale[bot]

/reopen

weiliu1031 avatar Nov 15 '23 11:11 weiliu1031

@weiliu1031: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Nov 15 '23 11:11 sre-ci-robot

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Dec 15 '23 20:12 stale[bot]

/unassign

yah01 avatar Dec 20 '23 06:12 yah01