Loading collection console log keeps looping with errors in milvus-sdk-java 2.4.0
Problem Description
When calling the loadCollection method after creating a collection with milvus-sdk-java 2.4.0, the MilvusServiceClient keeps executing a loading loop and throwing errors. Weirdly, milvus-attu shows that the collection has been loaded, but the console log keeps looping with errors.
Error Log
2024-04-26T15:31:33.347+08:00 RID-b04b984f-fb0f-44d6-9ca7-780db24edb53 WARN 34720 --- [nio-8443-exec-1] i.m.client.AbstractMilvusGrpcClient : Retry(6) with interval 2430ms. Reason: CANCELLED: Failed to read message.
2024-04-26T15:31:35.806+08:00 RID-b04b984f-fb0f-44d6-9ca7-780db24edb53 ERROR 34720 --- [nio-8443-exec-1] i.m.client.AbstractMilvusGrpcClient : LoadCollectionRequest collectionName:Entity_100000001_Multi_Vector_3cf4e5916b0549b7ab79d6c0b71be4ce RPC failed! Exception:{}
io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:275) ~[grpc-stub-1.57.2.jar:1.57.2]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:256) ~[grpc-stub-1.57.2.jar:1.57.2]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:169) ~[grpc-stub-1.57.2.jar:1.57.2]
at io.milvus.grpc.MilvusServiceGrpc$MilvusServiceBlockingStub.showCollections(MilvusServiceGrpc.java:4073) ~[milvus-sdk-java-2.4.0.jar:na]
at io.milvus.client.AbstractMilvusGrpcClient.waitForLoadingCollection(AbstractMilvusGrpcClient.java:94) ~[milvus-sdk-java-2.4.0.jar:na]
at io.milvus.client.AbstractMilvusGrpcClient.loadCollection(AbstractMilvusGrpcClient.java:565) ~[milvus-sdk-java-2.4.0.jar:na]
at io.milvus.client.MilvusServiceClient.lambda$loadCollection$8(MilvusServiceClient.java:454) ~[milvus-sdk-java-2.4.0.jar:na]
at io.milvus.client.MilvusServiceClient.retry(MilvusServiceClient.java:290) ~[milvus-sdk-java-2.4.0.jar:na]
at io.milvus.client.MilvusServiceClient.loadCollection(MilvusServiceClient.java:454) ~[milvus-sdk-java-2.4.0.jar:na]
at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.loadCollection(MilvusDatabaseServiceImpl.java:250) ~[classes/:na]
at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.createIndexesAndLoadCollection(MilvusDatabaseServiceImpl.java:151) ~[classes/:na]
at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.createCollection(MilvusDatabaseServiceImpl.java:131) ~[classes/:na]
Environment
- milvus-sdk-java version: 2.4.0
- JDK version: 17
- Operating System: windows
Steps to Reproduce
- define method loadCollection:
public R<RpcStatus> loadCollection(String collectionName) {
return milvusServiceClient.loadCollection(
LoadCollectionParam.newBuilder()
.withCollectionName(collectionName)
.build()
);
}
- invoke loadCollection()
Expected Behavior
The collection should be loaded successfully without looping errors.
Additional Information
Hope someone can help me resolve this issue.
Additionally, I noticed that the MilvusServiceClient has a default retry mechanism for almost every database interaction with private int maxRetryTimes = 75. Why is the retry count set to 75? Is there any specific reason behind this number?
The retry machinery is consistent with the milvus python sdk which is as-designed: https://github.com/milvus-io/pymilvus/blob/1081c49fcc21039300fec22e7b19805be8f198f0/pymilvus/decorators.py#L42
The loadCollection() calls showCollection() to check loading progress. Seems the showCollection() failed in rpc.
"CANCELLED: Failed to read message" is a GRPC error, it indicates the connection is broken or closed.
Yeah, it seems like grpc connection has crashed. I launched the Milvus standalone cluster in the local Ubuntu environment, the infra is as below:
I almost found the problem where is, cause my Milvus helm chart installs failed, and the query-node pod has not been found. I think maybe reinstalling the Milvus cluster can work normally. please help me confirm whether the cluster status is correct
The querycoord failed to initialize. Need the full log to know what the error is.