milvus
milvus copied to clipboard
[Bug]: After upgrading from 2.5 to master, the old queryNode cannot exit
Is there an existing issue for this?
- [x] I have searched the existing issues
Environment
- Milvus version: 2.5-20250613-5110130b-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
server
- config
common:
enabledGrowingSegmentJSONKeyStats: true
enabledJsonKeyStats: true
enabledOptimizeExpr: false
dataCoord:
enableActiveStandby: true
enabledJSONKeyStatsInSort: false
jsonStatsTriggerCount: 10
jsonStatsTriggerInterval: 10
indexCoord:
enableActiveStandby: true
log:
level: debug
queryCoord:
enableActiveStandby: true
rootCoord:
enableActiveStandby: true
upgrade image from 2.5 to master
2.5-20250613-5110130b-amd64 -> master-20250613-1bf960b1-amd64 old queryNode cannot exit
zong-roll-ddl-3-milvus-datacoord-7579d84648-xj9zz 1/1 Running 0 100m 10.104.32.122 4am-node39 <none> <none>
zong-roll-ddl-3-milvus-datanode-8d5566789-hj5x2 1/1 Running 0 100m 10.104.13.132 4am-node16 <none> <none>
zong-roll-ddl-3-milvus-datanode-8d5566789-r5c24 1/1 Running 0 100m 10.104.32.123 4am-node39 <none> <none>
zong-roll-ddl-3-milvus-indexcoord-944d87cf4-2gdb7 1/1 Running 0 100m 10.104.13.131 4am-node16 <none> <none>
zong-roll-ddl-3-milvus-indexnode-7876459657-746rd 1/1 Running 0 100m 10.104.33.67 4am-node36 <none> <none>
zong-roll-ddl-3-milvus-indexnode-7876459657-cwljx 1/1 Running 0 100m 10.104.15.20 4am-node20 <none> <none>
zong-roll-ddl-3-milvus-mixcoord-5845dc46bd-vqw4z 1/1 Running 0 37m 10.104.6.126 4am-node13 <none> <none>
zong-roll-ddl-3-milvus-proxy-64d4f6f4c7-xpjjs 1/1 Running 0 100m 10.104.27.203 4am-node31 <none> <none>
zong-roll-ddl-3-milvus-querycoord-6774648bdb-5c4m4 1/1 Running 0 100m 10.104.33.66 4am-node36 <none> <none>
zong-roll-ddl-3-milvus-querynode-0-66c498649f-jrlwb 1/1 Terminating 0 100m 10.104.6.112 4am-node13 <none> <none>
zong-roll-ddl-3-milvus-querynode-1-6c9c986cd5-4jp8w 1/1 Running 0 36m 10.104.15.93 4am-node20 <none> <none>
zong-roll-ddl-3-milvus-querynode-1-6c9c986cd5-v8msc 1/1 Running 0 4m36s 10.104.6.131 4am-node13 <none> <none>
zong-roll-ddl-3-milvus-rootcoord-5665b9858-rprln 1/1 Running 0 100m 10.104.6.111 4am-node13 <none> <none>
zong-roll-ddl-3-milvus-streamingnode-c657f9dc8-xskfr 1/1 Running 0 37m 10.104.6.127 4am-node13 <none> <none>
Expected Behavior
No response
Steps To Reproduce
argo workflow: zong-roll-ddl-3
Milvus Log
No response
Anything else?
No response
/unassign
The mixcoord is not startup while the old distributed coordinator is not down. The querynode is rolling update, and the segment and channel cannot be moved by old querycoord. We need to support rolling update the old distributed coordinator to new mixcoord before query node start rolling.
/assign @AlintaLu
@chyezh: GitHub didn't allow me to assign the following users: AlintaLu.
Note that only milvus-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
The mixcoord is not startup while the old distributed coordinator is not down. The querynode is rolling update, and the segment and channel cannot be moved by old querycoord. We need to support rolling update the old distributed coordinator to new mixcoord before query node start rolling.
/assign @AlintaLu
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
By confirmed by @haorenfsa We only support upgrade a 2.5 cluster with mixcoord to 2.6. If user want to upgrade a 2.5 cluster with distributed coord to 2.6, user need to change the distributed coord into mixcoord at 2.5.
/assign @ThreadDao So we only need to verify the cluster with mixcoord at 2.5. /unassign
@haorenfsa @LoveEachDay
this might be a issue need to solved by operator and helm?
Yes, will do later. For now we provided a doc to remove other coords for operator: https://milvus.io/docs/upgrade_milvus_cluster-operator.md#Upgrade-Milvus-Cluster-with-Milvus-Operator
I will test the 2.5 image of the milvus where multiple coords are combined into one mixCoord