milvus
milvus copied to clipboard
[Bug]: Querycoord panic after restarting docker with error `set empty delta channel info to meta of collection 435211660870549505`
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20220811-6c3dbf0
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.2.0.dev6
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
The search failed after restarting docker
RPC error: [search], <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:sift_128_euclidean, partitions:[], err = GetCollectionInfo failed, collection = sift_128_euclidean, err = err: find no available querycoord, check querycoord state
Search...
, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:259 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:160 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ShowCollections
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:210 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetCollectionInfo
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:439 github.com/milvus-io/milvus/internal/proxy.checkIfLoaded
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:[20](https://github.com/zhuwenxing/milvus/runs/7781667318?check_suite_focus=true#step:15:21)1 github.com/milvus-io/milvus/internal/proxy.(*searchTask).PreExecute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:452 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit
)>, <Time:{'RPC start': '2022-08-11 07:04:24.678108', 'RPC error': '2022-08-11 07:04:29.858532'}>
Traceback (most recent call last):
File "scripts/second_recall_test.py", line 64, in <module>
search_test(host)
File "scripts/second_recall_test.py", line 33, in search_test
res = collection.search(
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 717, in search
res = conn.search(self._name, data, anns_field, param, limit, expr,
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 96, in handler
raise e
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 92, in handler
return func(*args, **kwargs)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 74, in handler
raise e
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 48, in handler
return func(self, *args, **kwargs)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 451, in search
return self._execute_search_requests(requests, timeout, **_kwargs)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 415, in _execute_search_requests
raise pre_err
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 406, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:sift_128_euclidean, partitions:[], err = GetCollectionInfo failed, collection = sift_128_euclidean, err = err: find no available querycoord, check querycoord state
, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:259 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:160 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ShowCollections
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:[21](https://github.com/zhuwenxing/milvus/runs/7781667318?check_suite_focus=true#step:15:22)0 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetCollectionInfo
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:439 github.com/milvus-io/milvus/internal/proxy.checkIfLoaded
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:201 github.com/milvus-io/milvus/internal/proxy.(*searchTask).PreExecute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:452 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit
)>
Expected Behavior
all test cases passed
Steps To Reproduce
see https://github.com/zhuwenxing/milvus/runs/7781667318?check_suite_focus=true
Milvus Log
failed job: https://github.com/zhuwenxing/milvus/runs/7781667318?check_suite_focus=true log: https://github.com/zhuwenxing/milvus/suites/7765344016/artifacts/326509308
Anything else?
No response
same for standalone
failed job: https://github.com/zhuwenxing/milvus/runs/7781667170?check_suite_focus=true
log:https://github.com/zhuwenxing/milvus/suites/7765344016/artifacts/326509309
https://github.com/zhuwenxing/milvus/suites/7765344016/artifacts/326509309
all log page is 404!
@weiliu1031 Since I have rerun the failed job, the log link has changed. you can check the log below. the error log is in dir third_deploy
failed job: https://github.com/zhuwenxing/milvus/actions/runs/2838417006 log: https://github.com/zhuwenxing/milvus/suites/7766597708/artifacts/326598084 https://github.com/zhuwenxing/milvus/suites/7766597708/artifacts/326598083
/assign @weiliu1031
some information to sync: query coord panic due to showPartitions from root coord return 0 partition.
two more issues need track down:
- why root coord return 0 partition?
- query coord's behavior when getting some wrong infos.
/assign @longjiquan /unassign
caused by #18546 . Discussed with @jaime0815 , I removed the deprecated partitions
in collection info stored in etcd and ignored the compatibility since it's not released yet.
/assign @zhuwenxing
The upgrade is from v2.0.1 to master-latest and is not between daily builds, so it needs more investigation
/unassign
It reproduced stably
version milvusdb/milvus-dev:longjiquan-debug-meta-partitions-229f0164a-20220815
failed job:https://github.com/zhuwenxing/milvus/runs/7835788538?check_suite_focus=true
log: https://github.com/zhuwenxing/milvus/suites/7814773050/artifacts/330043167
@zhuwenxing try to reproduce on 2.1.4->master
/assign @zhuwenxing Please help to reproduce this again. thx, @zhuwenxing
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.