milvus
milvus copied to clipboard
[Bug]: Proxy pod can not recover with error `Failed to enqueue delete task: syncTimestamp Failed:context canceled` after etcd pod kill chaos test
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20220820-9589bd3
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.2.0.dev19
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
running test_e2e.py failed with error Fail connecting to server on 10.101.48.180:19530
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/impl.go:2365] ["Failed to enqueue insert task: syncTimestamp Failed:context canceled"]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/impl.go:2368] ["Finish processing insert request in Proxy"] [traceID=5cba86146820479d]
[2022/08/21 13:58:55.668 +00:00] [ERROR] [proxy/impl.go:2467] ["Failed to enqueue delete task: syncTimestamp Failed:context canceled"] [traceID=19bb2e3081bcbf1] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Delete\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:2467\ngithub.com/milvus-io/milvus/internal/distributed/proxy.(*Server).Delete\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:666\ngithub.com/milvus-io/milvus/internal/proto/milvuspb._MilvusService_Delete_Handler.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proto/milvuspb/milvus.pb.go:8626\ngithub.com/milvus-io/milvus/internal/proxy.UnaryServerInterceptor.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/privilege_interceptor.go:67\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware/auth.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/auth/auth.go:47\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/tracing/opentracing/server_interceptors.go:38\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34\ngithub.com/milvus-io/milvus/internal/proto/milvuspb._MilvusService_Delete_Handler\n\t/go/src/github.com/milvus-io/milvus/internal/proto/milvuspb/milvus.pb.go:8628\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:922"]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/impl.go:2471] ["Finish processing delete request in Proxy"] [traceID=19bb2e3081bcbf1]
[2022/08/21 13:58:55.668 +00:00] [ERROR] [proxy/impl.go:2467] ["Failed to enqueue delete task: syncTimestamp Failed:context canceled"] [traceID=7b52aa1ad2f02e54] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Delete\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:2467\ngithub.com/milvus-io/milvus/internal/distributed/proxy.(*Server).Delete\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:666\ngithub.com/milvus-io/milvus/internal/proto/milvuspb._MilvusService_Delete_Handler.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proto/milvuspb/milvus.pb.go:8626\ngithub.com/milvus-io/milvus/internal/proxy.UnaryServerInterceptor.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/privilege_interceptor.go:67\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware/auth.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/auth/auth.go:47\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/tracing/opentracing/server_interceptors.go:38\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34\ngithub.com/milvus-io/milvus/internal/proto/milvuspb._MilvusService_Delete_Handler\n\t/go/src/github.com/milvus-io/milvus/internal/proto/milvuspb/milvus.pb.go:8628\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:922"]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/impl.go:2471] ["Finish processing delete request in Proxy"] [traceID=7b52aa1ad2f02e54]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157828]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157793]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/proxy.go:354] ["close id allocator"] [role=proxy]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/proxy.go:359] ["close segment id assigner"] [role=proxy]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157880]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/proxy.go:364] ["close scheduler"] [role=proxy]
[2022/08/21 13:58:55.668 +00:00] [INFO] [proxy/proxy.go:372] ["close channels time ticker"] [role=proxy]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157856]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157895]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157912]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659157748]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158612]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158005]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158623]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158601]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158206]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158641]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158652]
[2022/08/21 13:58:55.668 +00:00] [DEBUG] [proxy/task_scheduler.go:257] ["Proxy dmTaskQueue popPChanStats"] [taskID=435444694659158663]
{"level":"warn","ts":"2022-08-21T13:58:55.723Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00079a8c0/#initially=[etcd-pod-kill-600:2379]","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
Expected Behavior
all test cases passed
Steps To Reproduce
see flow graph in `Anything else` below.
Milvus Log
artifacts-etcd-pod-kill-600-server-logs (1).tar.gz
artifacts-etcd-pod-kill-600-pytest-logs.tar.gz
Anything else?
/assign @soothing-rain /unassign
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/reopen
It appears that proxy failed to connect RootCoord and was killed later.
[2022/08/21 13:58:55.630 +00:00] [WARN] [grpcclient/client.go:251] ["rootcoord ClientBase ReCall grpc first call get error "] [error="err: failed to connect 10.102.7.141:53100, reason: context deadline exceeded\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:250 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:350 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).AllocTimestamp\n/go/src/github.com/milvus-io/milvus/internal/proxy/timestamp.go:61 github.com/milvus-io/milvus/internal/proxy.(*timestampAllocator).alloc\n/go/src/github.com/milvus-io/milvus/internal/proxy/timestamp.go:84 github.com/milvus-io/milvus/internal/proxy.(*timestampAllocator).AllocOne\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:173 github.com/milvus-io/milvus/internal/proxy.(*baseTaskQueue).Enqueue\n/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:2802 github.com/milvus-io/milvus/internal/proxy.(*Proxy).Query\n/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:678 github.com/milvus-io/milvus/internal/distributed/proxy.(*Server).Query\n/go/src/github.com/milvus-io/milvus/internal/proto/milvuspb/milvus.pb.go:8680 github.com/milvus-io/milvus/internal/proto/milvuspb._MilvusService_Query_Handler.func1\n/go/src/github.com/milvus-io/milvus/internal/proxy/privilege_interceptor.go:67 github.com/milvus-io/milvus/internal/proxy.UnaryServerInterceptor.func1\n"]
[2022/08/21 13:58:55.630 +00:00] [WARN] [proxy/impl.go:2803] ["Query failed to enqueue"] [error="syncTimestamp Failed:context deadline exceeded"] [traceID=6134af0a7ff8b08b] [role=proxy] [db=] [collection=Checker__eo6hlrN8] [partitions="[]"]
[2022/08/21 13:58:55.663 +00:00] [ERROR] [roles/roles.go:527] ["Get signal to exit\n"] [signal=interrupt] [stack="github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:527\ngithub.com/milvus-io/milvus/cmd/milvus.(*run).execute\n\t/go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:111\ngithub.com/milvus-io/milvus/cmd/milvus.RunMilvus\n\t/go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60\nmain.main\n\t/go/src/github.com/milvus-io/milvus/cmd/main.go:26\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2022/08/21 13:58:55.663 +00:00] [DEBUG] [proxy/service.go:503] ["Proxy stop"] ["internal address"=10.102.9.65:19529] ["external address"=10.102.9.65:19529]
I am also confused that the logs above are found in pod_log_previous
directory, but the proxy logs in pod_log
directory seem alright:
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/service.go:293] ["init Proxy server done"]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/service.go:295] ["start Proxy server"]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:303] ["start task scheduler"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:308] ["start task scheduler done"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:310] ["start id allocator"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:315] ["start id allocator done"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:317] ["start segment id assigner"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:322] ["start segment id assigner done"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:324] ["start channels time ticker"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:329] ["start channels time ticker done"] [role=proxy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [proxy/proxy.go:342] ["update state code"] [role=proxy] [State=Healthy]
[2022/08/21 13:58:57.128 +00:00] [DEBUG] [sessionutil/session_util.go:200] ["service begin to register to etcd"] [serverName=proxy] [ServerID=22]
[2022/08/21 13:58:57.131 +00:00] [INFO] [sessionutil/session_util.go:241] ["Service registered successfully"] [ServerName=proxy] [serverID=22]
[2022/08/21 13:58:57.131 +00:00] [DEBUG] [proxy/service.go:300] ["start Proxy server done"]
[2022/08/21 13:58:57.131 +00:00] [INFO] [components/proxy.go:54] ["Proxy successfully started"]
/assign @zhuwenxing /unassign
Not reproduced, so close it.