milvus
milvus copied to clipboard
[Bug]: Can only use one data node, index node
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.3.6
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): v2.3.7
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 4C16G
- GPU: None
- Others: None
Current Behavior
When I use one datanode + indexnode + querynode in one container, It work well. Then I expansion to two exmaples, It can't work
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
The container A [2024/03/18 11:44:12.878 +08:00] [INFO] [segments/segment_loader.go:584] ["load fields..."] [traceID=ad5cd64c8b92161c262edb4d39ec3779] [collectionID=448141181348513817] [partitionID=448141181348513818] [shard=milvus_t0-rootcoord-dml_8_448141181348513817v0] [segmentID=448141181348714298] [indexedFields="[104]"] [2024/03/18 11:44:12.878 +08:00] [WARN] [segments/cgo_util.go:58] ["CStatus returns err"] [errorName=UnknownError] [errorMsg="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META"] [2024/03/18 11:44:12.883 +08:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=28] ["total memory"=72] ["next GC"=61] ["new GOGC"=200] [gc-pause=25.762µs] [gc-pause-end=1710733452881997060] [2024/03/18 11:44:12.889 +08:00] [WARN] [kafka/kafka_consumer.go:138] ["consume msg failed"] [topic=milvus_t0-rootcoord-dml_5] [groupID=datanode-38-milvus_t0-rootcoord-dml_5_448141181342745720v0-true] [error="Local: Timed out"] [2024/03/18 11:44:12.891 +08:00] [WARN] [segments/segment_loader.go:261] ["load segment failed when load data into memory"] [traceID=ad5cd64c8b92161c262edb4d39ec3779] [collectionID=448141181348513817] [segmentType=Sealed] [partitionID=448141181348513818] [segmentID=448141181348714298] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META: service internal error: UnknownError"] [errorVerbose="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META: service internal error: UnknownError\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrServiceInternal\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/pkg/util/merr/utils.go:343\n | github.com/milvus-io/milvus/internal/querynodev2/segments.HandleCStatus\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/cgo_util.go:59\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*LoadIndexInfo).appendIndexData\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/load_index_info.go:159\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*LoadIndexInfo).appendLoadIndexInfo\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/load_index_info.go:100\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*LocalSegment).LoadIndex\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/segment.go:855\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).loadFieldIndex\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/segment_loader.go:725\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).loadFieldsIndex\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/segment_loader.go:678\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).loadSegment\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/segment_loader.go:587\n | github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentLoader).Load.func4\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/internal/querynodev2/segments/segment_loader.go:259\n | github.com/milvus-io/milvus/pkg/util/funcutil.ProcessFuncParallel.func3\n | \t/var/lib/docker/columbus/repo/pkg_repo/2024011912/138704/37407404/1705639174568/src_repo/recall-milvus/pkg/util/funcutil/parallel.go:86\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META\nWraps: (3) service internal error: UnknownError\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
The container B [2024/03/18 11:55:16.852 +08:00] [WARN] [cluster/worker.go:75] ["failed to call LoadSegments, worker return error"] [traceID=71903ba5ba8d7ad4a9073048197f0496] [workerID=38] [errorCode=UnexpectedError] [reason="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:55:16.852 +08:00] [WARN] [delegator/delegator_data.go:395] ["worker failed to load segments"] [traceID=71903ba5ba8d7ad4a9073048197f0496] [collectionID=448141181348513817] [channel=milvus_t0-rootcoord-dml_8_448141181348513817v0] [replicaID=448141181500522500] [workID=38] [segments="[448141181348514393]"] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:55:16.852 +08:00] [WARN] [querynodev2/services.go:454] ["delegator failed to load segments"] [traceID=71903ba5ba8d7ad4a9073048197f0496] [collectionID=448141181348513817] [partitionID=448141181348513818] [shard=milvus_t0-rootcoord-dml_8_448141181348513817v0] [segmentID=448141181348514393] [currentNodeID=39] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:55:17.787 +08:00] [INFO] [querynodev2/services.go:420] ["received load segments request"] [traceID=4e91aa4208c61f437d96f5d4541b3030] [collectionID=448141181348513817] [partitionID=448141181348513818] [shard=milvus_t0-rootcoord-dml_8_448141181348513817v0] [segmentID=448141181348514393] [currentNodeID=39] [version=1710734117786478706] [needTransfer=true] [2024/03/18 11:55:17.787 +08:00] [INFO] [segments/segment_loader.go:496] ["start loading remote..."] [traceID=4e91aa4208c61f437d96f5d4541b3030] [collectionID=448141181348513817] [segmentIDs="[448141181348514393]"] [segmentNum=1] [2024/03/18 11:55:17.787 +08:00] [INFO] [segments/segment_loader.go:506] ["loading bloom filter for remote..."] [traceID=4e91aa4208c61f437d96f5d4541b3030] [collectionID=448141181348513817] [segmentIDs="[448141181348514393]"] [2024/03/18 11:55:17.819 +08:00] [INFO] [segments/segment_loader.go:774] ["Successfully load pk stats"] [traceID=4e91aa4208c61f437d96f5d4541b3030] [segmentID=448141181348514393] [time=32.175293ms] [size=6779191] [2024/03/18 11:55:17.861 +08:00] [WARN] [cluster/worker.go:75] ["failed to call LoadSegments, worker return error"] [traceID=4e91aa4208c61f437d96f5d4541b3030] [workerID=38] [errorCode=UnexpectedError] [reason="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:55:17.861 +08:00] [WARN] [delegator/delegator_data.go:395] ["worker failed to load segments"] [traceID=4e91aa4208c61f437d96f5d4541b3030] [collectionID=448141181348513817] [channel=milvus_t0-rootcoord-dml_8_448141181348513817v0] [replicaID=448141181500522500] [workID=38] [segments="[448141181348514393]"] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:55:17.861 +08:00] [WARN] [querynodev2/services.go:454] ["delegator failed to load segments"] [traceID=4e91aa4208c61f437d96f5d4541b3030] [collectionID=448141181348513817] [partitionID=448141181348513818] [shard=milvus_t0-rootcoord-dml_8_448141181348513817v0] [segmentID=448141181348514393] [currentNodeID=39] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917817/1/448141181348513818/448141181348514393/SLICE_META: service internal error: UnknownError"]
The coord [2024/03/18 11:48:12.915 +08:00] [WARN] [task/executor.go:238] ["failed to load segment"] [taskID=1709522940657] [collectionID=448141181348513817] [replicaID=448141181500522500] [segmentID=448141181349315666] [node=38] [source=segment_checker] [shardLeader=39] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917451/4/448141181348513818/448141181349315666/SLICE_META: service internalerror: UnknownError"] [2024/03/18 11:48:12.915 +08:00] [INFO] [task/executor.go:119] ["execute action done, remove it"] [taskID=1709522940657] [step=0] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917451/4/448141181348513818/448141181349315666/SLICE_META: service internal error: UnknownError"][2024/03/18 11:48:13.051 +08:00] [WARN] [task/scheduler.go:727] ["task scheduler recordSegmentTaskError"] [taskID=1709522940655] [collectionID=448141181348513817] [replicaID=448141181500522500] [segmentID=448141181348714298] [status=failed] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:48:13.051 +08:00] [WARN] [meta/failed_load_cache.go:97] ["FailedLoadCache put failed record"] [collectionID=448141181348513817] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917446/7/448141181348513818/448141181348714298/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:48:13.051 +08:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1709522940655] [collectionID=448141181348513817] [replicaID=448141181500522500] [status=failed] [segmentID=448141181348714298][2024/03/18 11:48:13.051 +08:00] [WARN] [task/scheduler.go:727] ["task scheduler recordSegmentTaskError"] [taskID=1709522940656] [collectionID=448141181348513817] [replicaID=448141181500522500] [segmentID=448141181348914752] [status=failed] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917450/11/448141181348513818/448141181348914752/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:48:13.051 +08:00] [WARN] [meta/failed_load_cache.go:97] ["FailedLoadCache put failed record"] [collectionID=448141181348513817] [error="invalid local path:/home/service/var/data/milvus/data/index_files/448141181349917450/11/448141181348513818/448141181348914752/SLICE_META: service internal error: UnknownError"] [2024/03/18 11:48:13.051 +08:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1709522940656] [collectionID=448141181348513817] [replicaID=448141181500522500] [status=failed] [segmentID=448141181348914752]
Anything else?
No response
- if you have one collection and one shard, then yes you can only use one datanode.
- for index, it based on how many segments you have. each indexnode will take only one index job.
For smaller deployment, scale up might be more efficient that scale out
@didalaviva how did you deploy milvus cluster? it sounds like you are not using the official deployments? https://milvus.io/docs/install_cluster-milvusoperator.md
/assign @didalaviva /unassign
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
i hava this issue too , the data increase , it will happen
please attach the full milvus logs for investigation, thx
for each one shard, it will be only assigned to one datanode. this is by design.
under most cases one datanode with larger cpu cores would be good enough for your use cases.
for index, each task takes on node.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.