milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: QueryNode fail to start

Open dzqoo opened this issue 1 year ago • 7 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: 2.2.9 
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Centos
- CPU/Memory: 6c32gb
- GPU: 
- Others:

Current Behavior

I compiled milvus2.2.9 in my env successfully.But When I deloy the cluster with my binary and lib, querynode failed to start while the other components success to start. I got log from querynodes in the following: `Welcome to use Milvus! Version:
Built: Wed Jun 7 05:57:47 UTC 2023 GitCommit: GoVersion: go version go1.18.8 linux/amd64

open pid file: /run/milvus/querynode.pid lock pid file: /run/milvus/querynode.pid [2023/06/08 01:46:31.748 +00:00] [INFO] [roles/roles.go:226] ["starting running Milvus components"] [2023/06/08 01:46:31.748 +00:00] [INFO] [roles/roles.go:152] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] [2023/06/08 01:46:31.748 +00:00] [INFO] [management/server.go:68] ["management listen"] [addr=:9091] [2023/06/08 01:46:31.769 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32] [2023/06/08 01:46:31.770 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25] [2023/06/08 01:46:31.775 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"] [2023/06/08 01:46:31.775 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}] [2023/06/08 01:46:31.776 +00:00] [ERROR] [querynode/query_node.go:188] ["load queryhook failed"] [error="fail to set the querynode plugin path"] [stack="github.com/milvus-io/milvus/internal/querynode.NewQueryNode\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:188\ngithub.com/milvus-io/milvus/internal/distributed/querynode.NewServer\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:83\ngithub.com/milvus-io/milvus/cmd/components.NewQueryNode\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/components/query_node.go:40\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:110"] [2023/06/08 01:46:31.797 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.797 +00:00] [DEBUG] [paramtable/grpc_param.go:153] [initServerMaxSendSize] [role=querynode] [grpc.serverMaxSendSize=536870912] [2023/06/08 01:46:31.798 +00:00] [DEBUG] [paramtable/grpc_param.go:175] [initServerMaxRecvSize] [role=querynode] [grpc.serverMaxRecvSize=536870912] [2023/06/08 01:46:31.800 +00:00] [INFO] [querynode/service.go:106] [QueryNode] [port=21123] [2023/06/08 01:46:31.801 +00:00] [INFO] [querynode/service.go:122] ["QueryNode connect to etcd successfully"] [2023/06/08 01:46:31.902 +00:00] [INFO] [querynode/service.go:132] [QueryNode] [State=Initializing] [2023/06/08 01:46:31.902 +00:00] [INFO] [querynode/query_node.go:299] ["QueryNode session info"] [metaPath=by-dev/meta] [2023/06/08 01:46:31.902 +00:00] [INFO] [sessionutil/session_util.go:202] ["Session try to connect to etcd"] [2023/06/08 01:46:31.904 +00:00] [INFO] [sessionutil/session_util.go:217] ["Session connect to etcd success"] [2023/06/08 01:46:31.910 +00:00] [INFO] [sessionutil/session_util.go:300] ["Session get serverID success"] [key=id] [ServerId=411] [2023/06/08 01:46:31.929 +00:00] [INFO] [config/etcd_source.go:145] ["start refreshing configurations"] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/quota_param.go:745] ["init disk quota"] [diskQuota(MB)=+inf] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/quota_param.go:760] ["init disk quota per DB"] [diskQuotaPerCollection(MB)=1.7976931348623157e+308] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1543] ["init segment max idle time"] [value=10m0s] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1548] ["init segment min size from idle to sealed"] [value=16] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1558] ["init segment max binlog file to sealed"] [value=32] [2023/06/08 01:46:31.930 +00:00] [INFO] [paramtable/component_param.go:1553] ["init segment expansion rate"] [value=1.25] [2023/06/08 01:46:31.934 +00:00] [INFO] [paramtable/base_table.go:142] ["cannot find etcd.endpoints"] [2023/06/08 01:46:31.934 +00:00] [INFO] [paramtable/hook_config.go:19] ["hook config"] [hook={}] {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"logutil/logutil.go:165","message":"Log directory","configDir":""} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"logutil/logutil.go:166","message":"Set log file to ","path":""} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"querynode/query_node.go:209","message":"QueryNode init session","nodeID":411,"node address":"10.234.98.131:21123"} {"level":"INFO","time":"2023/06/08 01:46:31.935 +00:00","caller":"querynode/query_node.go:315","message":"QueryNode init rateCollector done","nodeID":411} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"storage/minio_chunk_manager.go:145","message":"minio chunk manager init success.","bucketname":"milvus-bucket","root":"file"} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"querynode/query_node.go:325","message":"queryNode try to connect etcd success","MetaRootPath":"by-dev/meta"} {"level":"INFO","time":"2023/06/08 01:46:31.944 +00:00","caller":"querynode/segment_loader.go:945","message":"SegmentLoader created","ioPoolSize":48,"cpuPoolSize":6} 2023-06-08 01:46:31,944 INFO [default] [KNOWHERE][SetBlasThreshold][milvus] Set faiss::distance_compute_blas_threshold to 16384 2023-06-08 01:46:31,945 INFO [default] [KNOWHERE][SetEarlyStopThreshold][milvus] Set faiss::early_stop_threshold to 0 2023-06-08 01:46:31,945 INFO [default] [KNOWHERE][SetStatisticsLevel][milvus] Set knowhere::STATISTICS_LEVEL to 0 2023-06-08 01:46:31,945 | DEBUG | default | [SERVER][operator()][milvus] Config easylogging with yaml file: /milvus/configs/easylogging.yaml 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SegcoreSetSimdType][milvus] set config simd_type: auto 2023-06-08 01:46:31,946 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS expect simdType::AUTO 2023-06-08 01:46:31,946 | INFO | default | [KNOWHERE][SetSimdType][milvus] FAISS hook AVX512 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SetIndexSliceSize][milvus] set config index slice size(byte): 16777216 2023-06-08 01:46:31,946 | DEBUG | default | [SEGCORE][SetThreadCoreCoefficient][milvus] set thread pool core coefficient: 10 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x372ed75]

runtime stack: runtime.throw({0x40010ba?, 0x7f85846297d0?}) /opt/go/goroot/src/runtime/panic.go:992 +0x71 runtime.sigpanic() /opt/go/goroot/src/runtime/signal_unix.go:802 +0x389

goroutine 207 [syscall]: runtime.cgocall(0x3144350, 0xc0012072c0) /opt/go/goroot/src/runtime/cgocall.go:157 +0x5c fp=0xc001207258 sp=0xc001207220 pc=0x14788bc github.com/milvus-io/milvus/internal/util/initcore._Cfunc_InitRemoteChunkManagerSingleton({0x7f85770ffa20, 0x7f8577028910, 0x7f8577028940, 0x7f8577028930, 0x7f85770085b0, 0x7f85770085c8, 0x7f85770085d0, 0x0, 0x0, {0x0, ...}}) _cgo_gotypes.go:122 +0x5b fp=0xc0012072c0 sp=0xc001207258 pc=0x2b21b3b github.com/milvus-io/milvus/internal/util/initcore.InitRemoteChunkManager(0x5d39640) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/util/initcore/init_storage_config.go:71 +0x2e5 fp=0xc0012073f8 sp=0xc0012072c0 pc=0x2b220e5 github.com/milvus-io/milvus/internal/querynode.(*QueryNode).InitSegcore(0x4470038?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:291 +0x23e fp=0xc001207478 sp=0xc0012073f8 pc=0x2de447e github.com/milvus-io/milvus/internal/querynode.(*QueryNode).Init.func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:346 +0x10e5 fp=0xc001207b50 sp=0xc001207478 pc=0x2de5865 sync.(*Once).doSlow(0x3f91fe6?, 0x14b8051?) /opt/go/goroot/src/sync/once.go:68 +0xc2 fp=0xc001207bb0 sp=0xc001207b50 pc=0x14ef022 sync.(*Once).Do(...) /opt/go/goroot/src/sync/once.go:59 github.com/milvus-io/milvus/internal/querynode.(*QueryNode).Init(0x3f91fe6?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/querynode/query_node.go:297 +0x5b fp=0xc001207bf8 sp=0xc001207bb0 pc=0x2de473b github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init(0xc000a32420) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:133 +0x76e fp=0xc001207ee8 sp=0xc001207bf8 pc=0x2fb25ce github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).Run(0xc000a34301?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:213 +0x25 fp=0xc001207f28 sp=0xc001207ee8 pc=0x2fb3925 github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run(0x5d39640?) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/components/query_node.go:54 +0x1d fp=0xc001207f60 sp=0xc001207f28 pc=0x313015d github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:120 +0x182 fp=0xc001207fe0 sp=0xc001207f60 pc=0x3132d82 runtime.goexit() /opt/go/goroot/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc001207fe8 sp=0xc001207fe0 pc=0x14e1fc1 created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:104 +0x18a

goroutine 1 [chan receive]: github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run(0xc0006c7e58, 0x0, {0x0, 0x0}) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/roles/roles.go:351 +0xb0d github.com/milvus-io/milvus/cmd/milvus.(*run).execute(0xc000a2c4b0, {0xc00004e090?, 0x3, 0x3}, 0xc000a32240) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/milvus/run.go:112 +0x66e github.com/milvus-io/milvus/cmd/milvus.RunMilvus({0xc00004e090?, 0x3, 0x3}) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/milvus/milvus.go:60 +0x21e main.main() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/cmd/main.go:26 +0x2e

goroutine 220 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008bd420) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:137 +0x34a

goroutine 230 [IO wait]: internal/poll.runtime_pollWait(0x7f854e859028, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc0010ad000?, 0xc000062500?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Accept(0xc0010ad000) /opt/go/goroot/src/internal/poll/fd_unix.go:614 +0x22c net.(*netFD).accept(0xc0010ad000) /opt/go/goroot/src/net/fd_unix.go:172 +0x35 net.(*TCPListener).accept(0xc000c02408) /opt/go/goroot/src/net/tcpsock_posix.go:139 +0x28 net.(*TCPListener).Accept(0xc000c02408) /opt/go/goroot/src/net/tcpsock.go:288 +0x3d net/http.(*Server).Serve(0xc0008960e0, {0x4445be0, 0xc000c02408}) /opt/go/goroot/src/net/http/server.go:3039 +0x385 net/http.(*Server).ListenAndServe(0xc0008960e0) /opt/go/goroot/src/net/http/server.go:2968 +0x7d net/http.ListenAndServe(...) /opt/go/goroot/src/net/http/server.go:3222 github.com/milvus-io/milvus/internal/management.ServeHTTP.func1() /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/management/server.go:69 +0x151 created by github.com/milvus-io/milvus/internal/management.ServeHTTP /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/management/server.go:66 +0x25

goroutine 231 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0004f5680) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:73 +0x22a

goroutine 311 [select]: google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0009bd9f0, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000a33380) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:413 +0x1f91

goroutine 204 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858f38, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc0005ec200?, 0xc0010ee000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc0005ec200, {0xc0010ee000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc0005ec200, {0xc0010ee000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc000472668, {0xc0010ee000?, 0x14c5000?, 0x801010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(*Reader).Read(0xc000a32060, {0xc000896200, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a32060}, {0xc000896200, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc000896200?, 0x9?, 0x54b09a2?}, {0x442a600?, 0xc000a32060?}) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x6e golang.org/x/net/http2.(*Framer).ReadFrame(0xc0008961c0) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000001e0) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:365 +0x193f

goroutine 234 [syscall]: os/signal.signal_recv() /opt/go/goroot/src/runtime/sigqueue.go:151 +0x2f os/signal.loop() /opt/go/goroot/src/os/signal/signal_unix.go:23 +0x19 created by os/signal.Notify.func1.1 /opt/go/goroot/src/os/signal/signal.go:151 +0x2a

goroutine 275 [select]: github.com/milvus-io/milvus/internal/config.(*EtcdSource).refreshConfigurationsPeriodically(0xc00131ee80) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(*EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 205 [select]: google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0009bc230, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc0005f2780) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:413 +0x1f91

goroutine 206 [select]: github.com/milvus-io/milvus/internal/config.(*EtcdSource).refreshConfigurationsPeriodically(0xc00091e180) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(*EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 208 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc000b0a740) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:73 +0x22a

goroutine 294 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858a88, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc000bb8080?, 0xc000aac000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc000bb8080, {0xc000aac000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc000bb8080, {0xc000aac000?, 0x3e081c0?, 0x14efd01?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc000c0e000, {0xc000aac000?, 0x0?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(*Reader).Read(0xc0003ee240, {0xc0008963c0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc0003ee240}, {0xc0008963c0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0008963c0?, 0x9?, 0xdac9e0d?}, {0x442a600?, 0xc0003ee240?}) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x6e golang.org/x/net/http2.(*Framer).ReadFrame(0xc000896380) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000005a0) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:365 +0x193f

goroutine 292 [select]: google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0009bd270, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000a32e40) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:413 +0x1f91

goroutine 291 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858d58, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc00056e180?, 0xc0013be000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc00056e180, {0xc0013be000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc00056e180, {0xc0013be000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc0001b7fa8, {0xc0013be000?, 0x14c5000?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(*Reader).Read(0xc000a32de0, {0xc0008962e0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a32de0}, {0xc0008962e0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0008962e0?, 0x9?, 0x6f1cae7?}, {0x442a600?, 0xc000a32de0?}) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x6e golang.org/x/net/http2.(*Framer).ReadFrame(0xc0008962a0) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000003c0) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:365 +0x193f

goroutine 276 [select]: github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollControllerWithTicker(0xc0002e4d00, 0xc0009bd540) /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/sampler_remote.go:144 +0x89 github.com/uber/jaeger-client-go.(*RemotelyControlledSampler).pollController(0xc0002e4d00) /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/sampler_remote.go:139 +0x6d created by github.com/uber/jaeger-client-go.NewRemotelyControlledSampler /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/sampler_remote.go:86 +0x15b

goroutine 279 [select]: github.com/uber/jaeger-client-go/utils.(*reconnectingUDPConn).reconnectLoop(0xc000868070, 0x0?) /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/utils/reconnecting_udp_conn.go:70 +0xbc created by github.com/uber/jaeger-client-go/utils.newReconnectingUDPConn /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/utils/reconnecting_udp_conn.go:60 +0x205

goroutine 280 [select]: github.com/uber/jaeger-client-go.(*remoteReporter).processQueue(0xc000b8f1a0) /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/reporter.go:296 +0xde created by github.com/uber/jaeger-client-go.NewRemoteReporter /opt/go/gopath/pkg/mod/github.com/uber/[email protected]+incompatible/reporter.go:237 +0x245

goroutine 281 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc00052dd80) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:73 +0x22a

goroutine 284 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858c68, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc0005eda80?, 0xc000614c00?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Accept(0xc0005eda80) /opt/go/goroot/src/internal/poll/fd_unix.go:614 +0x22c net.(*netFD).accept(0xc0005eda80) /opt/go/goroot/src/net/fd_unix.go:172 +0x35 net.(*TCPListener).accept(0xc0001ca3c0) /opt/go/goroot/src/net/tcpsock_posix.go:139 +0x28 net.(*TCPListener).Accept(0xc0001ca3c0) /opt/go/goroot/src/net/tcpsock.go:288 +0x3d google.golang.org/grpc.(*Server).Serve(0xc0008c9880, {0x4445be0?, 0xc0001ca3c0}) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/server.go:780 +0x477 github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).startGrpcLoop(0xc000a32420, 0x5283) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:203 +0x8ff created by github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/distributed/querynode/service.go:124 +0x5dd

goroutine 307 [select]: google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc00091b8c0) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 +0x73 created by google.golang.org/grpc.newCCBalancerWrapper /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:73 +0x22a

goroutine 295 [select]: google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0010b2050, 0x1) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:407 +0x115 google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc0003ee4e0) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:534 +0x85 google.golang.org/grpc/internal/transport.newHTTP2Client.func3() /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:415 +0x65 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:413 +0x1f91

goroutine 310 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858998, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc00088c180?, 0xc000976000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc00088c180, {0xc000976000, 0x8000, 0x8000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc00088c180, {0xc000976000?, 0x3e081c0?, 0x1?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc000ba74d8, {0xc000976000?, 0x14c5000?, 0x800010601?}) /opt/go/goroot/src/net/net.go:183 +0x45 bufio.(*Reader).Read(0xc000a33320, {0xc0004fe3c0, 0x9, 0x18?}) /opt/go/goroot/src/bufio/bufio.go:236 +0x1b4 io.ReadAtLeast({0x442a600, 0xc000a33320}, {0xc0004fe3c0, 0x9, 0x9}, 0x9) /opt/go/goroot/src/io/io.go:331 +0x9a io.ReadFull(...) /opt/go/goroot/src/io/io.go:350 golang.org/x/net/http2.readFrameHeader({0xc0004fe3c0?, 0x9?, 0xeda36b9?}, {0x442a600?, 0xc000a33320?}) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x6e golang.org/x/net/http2.(*Framer).ReadFrame(0xc0004fe380) /opt/go/gopath/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x95 google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0005d9a40) /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1498 +0x414 created by google.golang.org/grpc/internal/transport.newHTTP2Client /opt/go/gopath/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:365 +0x193f

goroutine 296 [select]: github.com/milvus-io/milvus/internal/config.(*EtcdSource).refreshConfigurationsPeriodically(0xc0006f1580) /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:147 +0x9f created by github.com/milvus-io/milvus/internal/config.(*EtcdSource).GetConfigurations.func1 /data/jianwang25/roadmap/milvus-2.2.9/milvus-2.2.9/internal/config/etcd_source.go:98 +0x5a

goroutine 304 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008683f0) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:137 +0x34a

goroutine 302 [IO wait]: internal/poll.runtime_pollWait(0x7f854e8588a8, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc0006f1a00?, 0xc00107f000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc0006f1a00, {0xc00107f000, 0x1000, 0x1000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc0006f1a00, {0xc00107f000?, 0x0?, 0x4?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc000662070, {0xc00107f000?, 0x0?, 0x0?}) /opt/go/goroot/src/net/net.go:183 +0x45 net/http.(*persistConn).Read(0xc001068360, {0xc00107f000?, 0x14c0b80?, 0xc000371ec8?}) /opt/go/goroot/src/net/http/transport.go:1929 +0x4e bufio.(*Reader).fill(0xc0003ef7a0) /opt/go/goroot/src/bufio/bufio.go:106 +0x103 bufio.(*Reader).Peek(0xc0003ef7a0, 0x1) /opt/go/goroot/src/bufio/bufio.go:144 +0x5d net/http.(*persistConn).readLoop(0xc001068360) /opt/go/goroot/src/net/http/transport.go:2093 +0x1ac created by net/http.(*Transport).dialConn /opt/go/goroot/src/net/http/transport.go:1750 +0x173e

goroutine 303 [select]: net/http.(*persistConn).writeLoop(0xc001068360) /opt/go/goroot/src/net/http/transport.go:2392 +0xf5 created by net/http.(*Transport).dialConn /opt/go/goroot/src/net/http/transport.go:1751 +0x1791

goroutine 305 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc000868460) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:137 +0x34a

goroutine 322 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc0008684d0) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:137 +0x34a

goroutine 323 [chan receive]: github.com/panjf2000/ants/v2.(*Pool).purgePeriodically(0xc000868540) /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:69 +0x8b created by github.com/panjf2000/ants/v2.NewPool /opt/go/gopath/pkg/mod/github.com/panjf2000/ants/[email protected]/pool.go:137 +0x34a

goroutine 324 [IO wait]: internal/poll.runtime_pollWait(0x7f854e858b78, 0x72) /opt/go/goroot/src/runtime/netpoll.go:302 +0x89 internal/poll.(*pollDesc).wait(0xc00091f300?, 0xc00121a000?, 0x0) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.(*pollDesc).waitRead(...) /opt/go/goroot/src/internal/poll/fd_poll_runtime.go:88 internal/poll.(*FD).Read(0xc00091f300, {0xc00121a000, 0x1000, 0x1000}) /opt/go/goroot/src/internal/poll/fd_unix.go:167 +0x25a net.(*netFD).Read(0xc00091f300, {0xc00121a000?, 0xc00150a6e0?, 0x14ef4de?}) /opt/go/goroot/src/net/fd_posix.go:55 +0x29 net.(*conn).Read(0xc000662800, {0xc00121a000?, 0x7f85487294c0?, 0x1483220?}) /opt/go/goroot/src/net/net.go:183 +0x45 net/http.(*connReader).Read(0xc001057c80, {0xc00121a000, 0x1000, 0x1000}) /opt/go/goroot/src/net/http/server.go:780 +0x16d bufio.(*Reader).fill(0xc0003efe00) /opt/go/goroot/src/bufio/bufio.go:106 +0x103 bufio.(*Reader).ReadSlice(0xc0003efe00, 0x0?) /opt/go/goroot/src/bufio/bufio.go:371 +0x2f bufio.(*Reader).ReadLine(0xc0003efe00) /opt/go/goroot/src/bufio/bufio.go:400 +0x27 net/textproto.(*Reader).readLineSlice(0xc0009171d0) /opt/go/goroot/src/net/textproto/reader.go:57 +0x99 net/textproto.(*Reader).ReadLine(...) /opt/go/goroot/src/net/textproto/reader.go:38 net/http.readRequest(0xc000662800?) /opt/go/goroot/src/net/http/request.go:1029 +0x79 net/http.(*conn).readRequest(0xc0010c4d20, {0x4447b10, 0xc0004f5cc0}) /opt/go/goroot/src/net/http/server.go:988 +0x24a net/http.(*conn).serve(0xc0010c4d20, {0x4447bb8, 0xc0010be8a0}) /opt/go/goroot/src/net/http/server.go:1891 +0x32b created by net/http.(*Server).Serve /opt/go/goroot/src/net/http/server.go:3071 +0x4db ` This seems to be a minio connecting issue.

the all commponents's status: image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

dzqoo avatar Jun 08 '23 01:06 dzqoo

This is mine compiling env: image

dzqoo avatar Jun 08 '23 02:06 dzqoo

@dzqoo It looks like there is something went wrong when initializing the RemoteChunkManager. Could you please provide the configuration for the storage part (with sensitive field masked)?

congqixia avatar Jun 08 '23 03:06 congqixia

this mine minio configs: ` existingSecret: "" bucketName: "milvus-bucket" rootPath: file useIAM: false iamEndpoint: "" podDisruptionBudget: enabled: false resources: requests: memory: 4Gi cpu: 1

gcsgateway: enabled: false replicas: 1 gcsKeyJson: "/etc/credentials/gcs_key.json" projectId: ""

service: type: NodePort port: 9000 nodePort: 31900

persistence: enabled: true existingClaim: "" storageClass: accessMode: ReadWriteOnce size: 500Gi

livenessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5

readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 5

startupProbe: enabled: true initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 60`

dzqoo avatar Jun 08 '23 03:06 dzqoo

this mine minio configs: ` existingSecret: "" bucketName: "milvus-bucket" rootPath: file useIAM: false iamEndpoint: "" podDisruptionBudget: enabled: false resources: requests: memory: 4Gi cpu: 1

gcsgateway: enabled: false replicas: 1 gcsKeyJson: "/etc/credentials/gcs_key.json" projectId: ""

service: type: NodePort port: 9000 nodePort: 31900

persistence: enabled: true existingClaim: "" storageClass: accessMode: ReadWriteOnce size: 500Gi

livenessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5

readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 5

startupProbe: enabled: true initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 60`

Under the configs, the office's docker images started well....

dzqoo avatar Jun 08 '23 03:06 dzqoo

Could you login to any milvus pods, paste the milvus.yaml under /milvus/configs directory?

LoveEachDay avatar Jun 08 '23 03:06 LoveEachDay

Could you login to any milvus pods, paste the milvus.yaml under /milvus/configs directory?

etcd:
  endpoints:
    - localhost:2379
  rootPath: by-dev # The root path where data is stored in etcd
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
  log:
    # path is one of:
    #  - "default" as os.Stderr,
    #  - "stderr" as os.Stderr,
    #  - "stdout" as os.Stdout,
    #  - file path to append server logs to.
    # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
    path: stdout
    level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  use:
    # please adjust in embedded Milvus: true
    embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
  data:
    # Embedded Etcd only.
    # please adjust in embedded Milvus: /tmp/milvus/etcdData/
    dir: default.etcd
  ssl:
    enabled: false # Whether to support ETCD secure connection mode
    tlsCert: /path/to/etcd-client.pem # path to your cert file
    tlsKey: /path/to/etcd-client-key.pem # path to your key file
    tlsCACert: /path/to/ca.pem # path to your CACert file
    # TLS min version
    # Optional values: 1.0, 1.1, 1.2, 1.3。
    # We recommend using version 1.2 and above
    tlsMinVersion: 1.3

# Default value: etcd
# Valid values: [etcd, mysql]
metastore:
  type: etcd

# Related configuration of mysql, used to store Milvus metadata.
mysql:
  username: root
  password: 123456
  address: localhost
  port: 3306
  dbName: milvus_meta
  driverName: mysql
  maxOpenConns: 20
  maxIdleConns: 5

# please adjust in embedded Milvus: /tmp/milvus/data/
localStorage:
  path: /var/lib/milvus/data/

# Related configuration of MinIO/S3/GCS or any other service supports S3 API, which is responsible for data persistence for Milvus.
# We refer to the storage service as MinIO/S3 in the following description for simplicity.
minio:
  address: localhost # Address of MinIO/S3
  port: 9000   # Port of MinIO/S3
  accessKeyID: minioadmin # accessKeyID of MinIO/S3
  secretAccessKey: minioadmin # MinIO/S3 encryption string
  useSSL: false # Access to MinIO/S3 with SSL
  bucketName: "a-bucket" # Bucket name in MinIO/S3
  rootPath: files # The root path where the message is stored in MinIO/S3
   # Whether to use IAM role to access S3/GCS instead of access/secret keys
  # For more infomation, refer to
  # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
  # gcp: https://cloud.google.com/storage/docs/access-control/iam
  # aliyun (ack): https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-rrsa-to-enforce-access-control 
  # aliyun (ecs): https://www.alibabacloud.com/help/en/elastic-compute-service/latest/attach-an-instance-ram-role
  useIAM: false
  # Cloud Provider of S3. Supports: "aws", "gcp", "aliyun". 
  # You can use "aws" for other cloud provider supports S3 API with signature v4, e.g.: minio
  # You can use "gcp" for other cloud provider supports S3 API with signature v2
  # You can use "aliyun" for other cloud provider uses virtual host style bucket 
  # When useIAM enabled, only "aws", "gcp", "aliyun" is supported for now
  cloudProvider: aws
  # Custom endpoint for fetch IAM role credentials. when useIAM is true & cloudProvider is "aws".
  # Leave it empty if you want to use AWS default endpoint
  iamEndpoint: ""

# Milvus supports three MQ: rocksmq(based on RockDB), Pulsar and Kafka, which should be reserved in config what you use.
# There is a note about enabling priority if we config multiple mq in this file
# 1. standalone(local) mode: rockskmq(default) > Pulsar > Kafka
# 2. cluster mode:  Pulsar(default) > Kafka (rocksmq is unsupported)

# Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
  address: localhost # Address of pulsar
  port: 6650 # Port of pulsar
  webport: 80 # Web port of pulsar, if you connect direcly without proxy, should use 8080
  maxMessageSize: 5242880 # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.
  tenant: public
  namespace: default

# If you want to enable kafka, needs to comment the pulsar configs
kafka:
  producer:
    client.id: dc
  consumer:
    client.id: dc1
#  brokerList: localhost1:9092,localhost2:9092,localhost3:9092
#  saslUsername: username
#  saslPassword: password
#  saslMechanisms: PLAIN
#  securityProtocol: SASL_SSL

rocksmq:
  # please adjust in embedded Milvus: /tmp/milvus/rdb_data
  path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
  rocksmqPageSize: 67108864 # 64 MB, 64 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
  retentionTimeInMinutes: 4320 # 3 days, 3 * 24 * 60 minutes, The retention time of the message in rocksmq.
  retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
  compactionInterval: 86400 # 1 day, trigger rocksdb compaction every day to remove deleted data
  lrucacheratio: 0.06 # rocksdb cache memory ratio

# Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
  address: localhost
  port: 53100
  enableActiveStandby: false  # Enable active-standby

  dmlChannelNum: 16 # The number of dml channels created at system startup
  maxDatabaseNum: 64 # Maximum number of database
  maxPartitionNum: 4096 # Maximum number of partitions in a collection
  minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed

  # (in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskExpiration: 900
  # (in seconds) Milvus will keep the record of import tasks for at least `importTaskRetention` seconds. Default 86400
  # seconds (24 hours).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskRetention: 86400

# Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy:
  port: 19530
  internalPort: 19529
  http:
    enabled: true # Whether to enable the http server
    debug_mode: false # Whether to enable http server debug mode

  timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
  msgStream:
    timeTick:
      bufSize: 512
  maxNameLength: 255  # Maximum length of name for a collection or alias
  maxFieldNum: 64     # Maximum number of fields in a collection.
  # As of today (2.2.0 and after) it is strongly DISCOURAGED to set maxFieldNum >= 64.
  # So adjust at your risk!
  maxDimension: 32768 # Maximum dimension of a vector
  # It's strongly DISCOURAGED to set `maxShardNum` > 64.
  maxShardNum: 16 # Maximum number of shards in a collection
  maxTaskNum: 1024 # max task number of proxy task queue
  # please adjust in embedded Milvus: false
  ginLogging: true # Whether to produce gin logs.
  grpc:
    serverMaxRecvSize: 67108864 # 64M
    serverMaxSendSize: 67108864 # 64M
    clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
    clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024



# Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:
  address: localhost
  port: 19531
  autoHandoff: true # Enable auto handoff
  autoBalance: true # Enable auto balance
  balancer: ScoreBasedBalancer # Balancer to use
  globalRowCountFactor: 0.1 # expert parameters, only used by scoreBasedBalancer
  scoreUnbalanceTolerationFactor: 0.05 # expert parameters, only used by scoreBasedBalancer
  reverseUnBalanceTolerationFactor: 1.3 #expert parameters, only used by scoreBasedBalancer
  overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
  balanceIntervalSeconds: 60
  memoryUsageMaxDifferencePercentage: 30
  checkInterval: 10000
  channelTaskTimeout: 60000 # 1 minute
  segmentTaskTimeout: 120000 # 2 minute
  distPullInterval: 500
  loadTimeoutSeconds: 1800
  checkHandoffInterval: 5000
  taskMergeCap: 8
  taskExecutionCap: 256
  enableActiveStandby: false  # Enable active-standby
  refreshTargetsIntervalSeconds: 300

# Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode:
  cacheSize: 32 # GB, default 32 GB, `cacheSize` is the memory used for caching data for faster query. The `cacheSize` must be less than system memory size.
  port: 21123
  loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments
  enableDisk: true # enable querynode load disk index, and search on disk index
  maxDiskUsagePercentage: 95
  gracefulStopTimeout: 30

  stats:
    publishInterval: 1000 # Interval for querynode to report node information (milliseconds)
  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  # Segcore will divide a segment into multiple chunks to enbale small index
  segcore:
    chunkRows: 1024 # The number of vectors in a chunk.
    knowhereThreadPoolNumRatio: 4 # Use more threads to make good use of SSD throughput
    # Note: we have disabled segment small index since @2022.05.12. So below related configurations won't work.
    # We won't create small index for growing segments and search on these segments will directly use bruteforce scan.
    smallIndex:
      nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
      nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist
  cache:
    enabled: true
    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024

  scheduler:
    receiveChanSize: 10240
    unsolvedQueueSize: 10240
    # maxReadConcurrentRatio is the concurrency ratio of read task (search task and query task).
    # Max read concurrency would be the value of `runtime.NumCPU * maxReadConcurrentRatio`.
    # It defaults to 2.0, which means max read concurrency would be the value of runtime.NumCPU * 2.
    # Max read concurrency must greater than or equal to 1, and less than or equal to runtime.NumCPU * 100.
    maxReadConcurrentRatio: 2.0 # (0, 100]
    cpuRatio: 10.0 # ratio used to estimate read task cpu usage.
    # maxTimestampLag is the max ts lag between serviceable and guarantee timestamp.
    # if the lag is larger than this config, scheduler will return error without waiting.
    # the valid value is [3600, infinite)
    maxTimestampLag: 86400
    # read task schedule policy: fifo(by default), user-task-polling.
    scheduleReadPolicy: 
      # fifo: A FIFO queue support the schedule.
      # user-task-polling: 
      #     The user's tasks will be polled one by one and scheduled. 
      #     Scheduling is fair on task granularity.
      #     The policy is based on the username for authentication.
      #     And an empty username is considered the same user. 
      #     When there are no multi-users, the policy decay into FIFO
      name: fifo
      # user-task-polling configure:
      taskQueueExpire: 60 # 1 min by default, expire time of inner user task queue since queue is empty.

  grouping:
    enabled: true
    maxNQ: 50000
    topKMergeRatio: 10.0

indexCoord:
  address: localhost
  port: 31000
  enableActiveStandby: false  # Enable active-standby

  minSegmentNumRowsToEnableIndex: 1024 # It's a threshold. When the segment num rows is less than this value, the segment will not be indexed

  bindIndexNodeMode:
    enable: false
    address: "localhost:22930"
    withCred: false
    nodeID: 0

  gc:
    interval: 600 # gc interval in seconds

  scheduler:
    interval: 1000 # scheduler interval in Millisecond

indexNode:
  port: 21121
  enableDisk: true # enable index node build disk vector index
  maxDiskUsagePercentage: 95
  gracefulStopTimeout: 30

  scheduler:
    buildParallel: 1

dataCoord:
  address: localhost
  port: 13333
  enableCompaction: true # Enable data segment compaction
  enableGarbageCollection: true
  enableActiveStandby: false  # Enable active-standby

  channel:
    watchTimeoutInterval: 120 # Timeout on watching channels (in seconds). Datanode tickler update watch progress will reset timeout timer.
    balanceSilentDuration: 300 # The duration before the channelBalancer on datacoord to run
    balanceInterval: 360 #The interval for the channelBalancer on datacoord to check balance status

  segment:
    maxSize: 512 # Maximum size of a segment in MB
    diskSegmentMaxSize: 2048 # Maximun size of a segment in MB for collection which has Disk index
    # Minimum proportion for a segment which can be sealed.
    # Sealing early can prevent producing large growing segments in case these segments might slow down our search/query.
    # Segments that sealed early will be compacted into a larger segment (within maxSize) eventually.
    sealProportion: 0.23
    assignmentExpiration: 2000 # The time of the assignment expiration in ms
    maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60
    # If a segment didn't accept dml records in `maxIdleTime` and the size of segment is greater than
    # `minSizeFromIdleToSealed`, Milvus will automatically seal it.
    maxIdleTime: 600 # The max idle time of segment in seconds, 10*60.
    minSizeFromIdleToSealed: 16 # The min size in MB of segment which can be idle from sealed.
    # The max number of binlog file for one segment, the segment will be sealed if
    # the number of binlog file reaches to max value.
    maxBinlogFileNumber: 32
    smallProportion: 0.5 # The segment is considered as "small segment" when its # of rows is smaller than
    # (smallProportion * segment max # of rows).
    compactableProportion: 0.85 # A compaction will happen on small segments if the segment after compaction will have
    # over (compactableProportion * segment max # of rows) rows.
    # MUST BE GREATER THAN OR EQUAL TO <smallProportion>!!!
    expansionRate: 1.25 # During compaction, the size of segment # of rows is able to exceed segment max # of rows by (expansionRate-1) * 100%.

  compaction:
    enableAutoCompaction: true

  gc:
    interval: 3600 # gc interval in seconds
    missingTolerance: 86400 # file meta missing tolerance duration in seconds, 60*24
    dropTolerance: 3600 # file belongs to dropped entity tolerance duration in seconds


dataNode:
  port: 21124

  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  segment:
    # Max buffer size to flush for a single segment.
    insertBufSize: 16777216 # Bytes, 16 MB
    # Max buffer size to flush del for a single channel
    deleteBufBytes: 67108864 # Bytes, 64MB
    # The period to sync segments if buffer is not empty.
    syncPeriod: 600 # Seconds, 10min

  memory:
    forceSyncEnable: true # `true` to force sync if memory usage is too high
    forceSyncSegmentNum: 1 # number of segments to sync, segments with top largest buffer will be synced.
    watermarkStandalone: 0.2 # memory watermark for standalone, upon reaching this watermark, segments will be synced.
    watermarkCluster: 0.5 # memory watermark for cluster, upon reaching this watermark, segments will be synced.

# Configures the system log output.
log:
  level: debug # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  stdout: "true" # default true, print log to stdout
  file:
    # please adjust in embedded Milvus: /tmp/milvus/logs
    rootPath: "" # root dir path to put logs, default "" means no log file will print
    maxSize: 300 # MB
    maxAge: 10 # Maximum time for log retention in day.
    maxBackups: 20
  format: text # text/json

grpc:
  log:
    level: WARNING

  serverMaxRecvSize: 536870912 # 512MB
  serverMaxSendSize: 536870912 # 512MB
  clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
  clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024

  client:
    dialTimeout: 200
    keepAliveTime: 10000
    keepAliveTimeout: 20000
    maxMaxAttempts: 5
    initialBackOff: 1.0
    maxBackoff: 60.0
    backoffMultiplier: 2.0
  server:
    retryTimes: 5 # retry times when receiving a grpc return value with a failure and retryable state code

# Configure the proxy tls enable.
tls:
  serverPemPath: configs/cert/server.pem
  serverKeyPath: configs/cert/server.key
  caPemPath: configs/cert/ca.pem


common:
  # Channel name generation rule: ${namePrefix}-${ChannelIdx}
  chanNamePrefix:
    cluster: "by-dev"
    rootCoordTimeTick: "rootcoord-timetick"
    rootCoordStatistics: "rootcoord-statistics"
    rootCoordDml: "rootcoord-dml"
    rootCoordDelta: "rootcoord-delta"
    search: "search"
    searchResult: "searchResult"
    queryTimeTick: "queryTimeTick"
    queryNodeStats: "query-node-stats"
    # Cmd for loadIndex, flush, etc...
    cmd: "cmd"
    dataCoordStatistic: "datacoord-statistics-channel"
    dataCoordTimeTick: "datacoord-timetick-channel"
    dataCoordSegmentInfo: "segment-info-channel"

  # Sub name generation rule: ${subNamePrefix}-${NodeID}
  subNamePrefix:
    rootCoordSubNamePrefix: "rootCoord"
    proxySubNamePrefix: "proxy"
    queryNodeSubNamePrefix: "queryNode"
    dataNodeSubNamePrefix: "dataNode"
    dataCoordSubNamePrefix: "dataCoord"

  defaultPartitionName: "_default"  # default partition name for a collection
  defaultIndexName: "_default_idx"  # default index name
  retentionDuration: 0     # time travel reserved time, insert/delete will not be cleaned in this period. disable it by default
  entityExpiration: -1     # Entity expiration in seconds, CAUTION make sure entityExpiration >= retentionDuration and -1 means never expire

  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.
  gracefulStopTimeout: 30 # seconds. it will force quit the server if the graceful stop process is not completed during this time.

  # Default value: auto
  # Valid values: [auto, avx512, avx2, avx, sse4_2]
  # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
  simdType: auto
  indexSliceSize: 16 # MB
  DiskIndex:
    MaxDegree: 56
    SearchListSize: 100
    PQCodeBudgetGBRatio: 0.125
    BuildNumThreadsRatio: 1.0
    SearchCacheBudgetGBRatio: 0.10
    LoadNumThreadRatio: 8.0
    BeamWidthRatio: 4.0
  # This parameter specify how many times the number of threads is the number of cores
  threadCoreCoefficient : 10

  # please adjust in embedded Milvus: local
  storageType: minio

  security:
    authorizationEnabled: false
    # The superusers will ignore some system check processes,
    # like the old password verification when updating the credential
    # superUsers:
    #  - "root"
    # tls mode values [0, 1, 2]
    # 0 is close, 1 is one-way authentication, 2 is two-way authentication.
    tlsMode: 0

  session:
    ttl: 20 # ttl value when session granting a lease to register service
    retryTimes: 30 # retry times when session sending etcd requests

  ImportMaxFileSize: 17179869184  # 16 * 1024 * 1024 * 1024
  # max file size to import for bulkInsert

# QuotaConfig, configurations of Milvus quota and limits.
# By default, we enable:
#   1. TT protection;
#   2. Memory protection.
#   3. Disk quota protection.
# You can enable:
#   1. DML throughput limitation;
#   2. DDL, DQL qps/rps limitation;
#   3. DQL Queue length/latency protection;
#   4. DQL result rate protection;
# If necessary, you can also manually force to deny RW requests.
quotaAndLimits:
  enabled: true # `true` to enable quota and limits, `false` to disable.
  limits:
    maxCollectionNum: 65536
    maxCollectionNumPerDB: 65536
  # quotaCenterCollectInterval is the time interval that quotaCenter
  # collects metrics from Proxies, Query cluster and Data cluster.
  quotaCenterCollectInterval: 3 # seconds, (0 ~ 65536)

  ddl: # ddl limit rates, default no limit.
    enabled: false
    collectionRate: -1 # qps, default no limit, rate for CreateCollection, DropCollection, LoadCollection, ReleaseCollection
    partitionRate: -1 # qps, default no limit, rate for CreatePartition, DropPartition, LoadPartition, ReleasePartition

  indexRate:
    enabled: false
    max: -1 # qps, default no limit, rate for CreateIndex, DropIndex
  flushRate:
    enabled: false
    max: -1 # qps, default no limit, rate for flush
  compactionRate:
    enabled: false
    max: -1 # qps, default no limit, rate for manualCompaction

  # dml limit rates, default no limit.
  # The maximum rate will not be greater than `max`.
  dml:
    enabled: false
    insertRate:
      collection:
        max: -1 # MB/s, default no limit
      max: -1 # MB/s, default no limit
    deleteRate:
      collection:
        max: -1 # MB/s, default no limit
      max: -1 # MB/s, default no limit
    bulkLoadRate: # not support yet. TODO: limit bulkLoad rate
      collection:
        max: -1 # MB/s, default no limit
      max: -1 # MB/s, default no limit

  # dql limit rates, default no limit.
  # The maximum rate will not be greater than `max`.
  dql:
    enabled: false
    searchRate:
      collection:
        max: -1 # vps (vectors per second), default no limit
      max: -1 # vps (vectors per second), default no limit
    queryRate:
      collection:
        max: -1 # qps, default no limit
      max: -1 # qps, default no limit

  # limitWriting decides whether dml requests are allowed.
  limitWriting:
    # forceDeny `false` means dml requests are allowed (except for some
    # specific conditions, such as memory of nodes to water marker), `true` means always reject all dml requests.
    forceDeny: false
    ttProtection:
      enabled: false
      # maxTimeTickDelay indicates the backpressure for DML Operations.
      # DML rates would be reduced according to the ratio of time tick delay to maxTimeTickDelay,
      # if time tick delay is greater than maxTimeTickDelay, all DML requests would be rejected.
      maxTimeTickDelay: 300 # in seconds
    memProtection:
      enabled: true
      # When memory usage > memoryHighWaterLevel, all dml requests would be rejected;
      # When memoryLowWaterLevel < memory usage < memoryHighWaterLevel, reduce the dml rate;
      # When memory usage < memoryLowWaterLevel, no action.
      # memoryLowWaterLevel should be less than memoryHighWaterLevel.
      dataNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in DataNodes
      dataNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in DataNodes
      queryNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in QueryNodes
      queryNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in QueryNodes
    growingSegmentsSizeProtection:
      # 1. No action will be taken if the ratio of growing segments size is less than the low water level.
      # 2. The DML rate will be reduced if the ratio of growing segments size is greater than the low water level and less than the high water level.
      # 3. All DML requests will be rejected if the ratio of growing segments size is greater than the high water level.
      enabled: false
      lowWaterLevel: 0.2
      highWaterLevel: 0.4
    diskProtection:
      # When the total file size of object storage is greater than `diskQuota`, all dml requests would be rejected;
      enabled: true
      diskQuota: -1 # MB, (0, +inf), default no limit
      diskQuotaPerCollection: -1 # MB, (0, +inf), default no limit

  # limitReading decides whether dql requests are allowed.
  limitReading:
    # forceDeny `false` means dql requests are allowed (except for some
    # specific conditions, such as collection has been dropped), `true` means always reject all dql requests.
    forceDeny: false
    queueProtection:
      enabled: false
      # nqInQueueThreshold indicated that the system was under backpressure for Search/Query path.
      # If NQ in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates would gradually cool off
      # until the NQ in queue no longer exceeds nqInQueueThreshold. We think of the NQ of query request as 1.
      nqInQueueThreshold: -1 # int, default no limit

      # queueLatencyThreshold indicated that the system was under backpressure for Search/Query path.
      # If dql latency of queuing is greater than queueLatencyThreshold, search&query rates would gradually cool off
      # until the latency of queuing no longer exceeds queueLatencyThreshold.
      # The latency here refers to the averaged latency over a period of time.
      queueLatencyThreshold: -1 # milliseconds, default no limit
    resultProtection:
      enabled: false
      # maxReadResultRate indicated that the system was under backpressure for Search/Query path.
      # If dql result rate is greater than maxReadResultRate, search&query rates would gradually cool off
      # until the read result rate no longer exceeds maxReadResultRate.
      maxReadResultRate: -1 # MB/s, default no limit
    # coolOffSpeed is the speed of search&query rates cool off.
    coolOffSpeed: 0.9 # (0, 1]

autoIndex:
  params:
    build: '{"M": 30,"efConstruction": 360,"index_type": "HNSW", "metric_type": "IP"}'

dzqoo avatar Jun 08 '23 03:06 dzqoo

/assign @congqixia /unassign

yanliang567 avatar Jun 08 '23 05:06 yanliang567

I also encountered this panic problem, which is also a mirror image of centos. Is there any progress on this issue now?

  • Milvus version: 2.2.9
  • Deployment mode(standalone or cluster):cluster
  • MQ type(rocksmq, pulsar or kafka): pulsar
  • SDK version(e.g. pymilvus v2.0.0rc2):
  • OS(Ubuntu or CentOS): Centos
  • CPU/Memory: 4c8g
  • GPU: no
  • Others:

@yanliang567

mrrtree avatar Jun 16 '23 07:06 mrrtree

image

mrrtree avatar Jun 16 '23 09:06 mrrtree

@congqixia any ideas?

yanliang567 avatar Jun 16 '23 10:06 yanliang567

image

mrrtree avatar Jun 27 '23 05:06 mrrtree

@mrrtree Sorry for the late reply Quick question, which OSS service did you use when you encounter this problem?

congqixia avatar Jun 27 '23 07:06 congqixia

@mrrtree Sorry for the late reply Quick question, which OSS service did you use when you encounter this problem?

minio。i guess the problem is openssl version, which is 1.0.2 in cenos7

mrrtree avatar Jul 04 '23 04:07 mrrtree

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 03 '23 04:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 05 '23 00:09 stale[bot]