milvus [Bug]: inserting 1000 documents, milvus crashes and unavailable.

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: 2.3.3
- Deployment mode(standalone or cluster): cluster-k8s
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.5
- OS(Ubuntu or CentOS):  kylin v10+x86
- CPU/Memory: master 8c8g*3, node 8c12g*3
- GPU: no
- Others:

Current Behavior

During the stress test, the streaming insertion of 1000 vectorized PDFs into collection_1 was successful. However, when inserting the second 1000 documents, milvus crashed and became unavailable, and the data preview of attu timed out. Can retrieve whether collection_other exists, but an insert error is reported in collection_1.

error: info: Traceback (most recent call last): File "python3.9/site-packages/pymilvus/decorators.py", line 50, in handler return func(self, *args, **kwargs) File "python3.9/site-packages/pymilvus/client/grpc_handler.py", line 399, in batch_insert raise err File "python3.9/site-packages/pymilvus/client/grpc_handler.py", line 389, in batch_insert response = rf.result() File "python3.9/site-packages/grpc/_channel.py", line 797, in result raise self grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "UNKNOWN:Deadline Exceeded {grpc_status:4}"

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "milvus_store.py", line 259, in _add_documents res = self.col.insert( File "python3.9/site-packages/pymilvus/orm/collection.py", line 430, in insert res = conn.batch_insert(self._name, entities, partition_name, File "python3.9/site-packages/pymilvus/decorators.py", line 109, in handler raise e File "python3.9/site-packages/pymilvus/decorators.py", line 105, in handler return func(*args, **kwargs) File "python3.9/site-packages/pymilvus/decorators.py", line 136, in handler ret = func(self, *args, **kwargs) File "python3.9/site-packages/pymilvus/decorators.py", line 64, in handler raise MilvusException(message=f"rpc deadline exceeded: {timeout_msg}") from e pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 30s)>

code: for xxxx in documents: self.col.insert() self.col.flush()

Expected Behavior

How much storage can I have in my configured cluster deployment, 3master and 3node? If the stock is not enough, I hope the prompt will be more direct. At least 3000 documents can be inserted k8s-pods

Steps To Reproduce

No response

Milvus Log

pod_log.zip

Anything else?

No response

May 09 '24 07:05 Gy1900

How should this problem be solved? I see in the log: [bookkeeper-io-3-13] ERROR org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl - Unable to allocate memory io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 67108864, max: 67108864)

May 09 '24 07:05 Gy1900

After milvus crashes, restart all nodes and it can be used normally.

May 09 '24 07:05 Gy1900

/assign @congqixia /unassign

May 10 '24 01:05 yanliang567

能否告知一下到底大概是哪里的问题，因为是生产上的，有点等不及，谢谢。@yanliang567

May 15 '24 07:05 Gy1900

@Gy1900 from the log you provided, we could not find why the milvus crashed. Looks like there was some input stream cannot get any messages. It could be a known issue for pulsar client that milvus pod must be restarted after the pulsar went into readonly mode due to out of disk space.

May 15 '24 08:05 congqixia

@Gy1900从您提供的日志中，我们无法找到 milvus 崩溃的原因。看起来有些输入流无法获取任何消息。对于 pulsar 客户端来说，由于磁盘空间不足而进入只读模式后，必须重新启动 milvus pod，这可能是 pulsar 客户端的一个已知问题。

那对于pulsar，我应该怎么做？换成rocksmq，还有更简单的方法吗？ pulsar空间不足，是否会导致 milvus崩溃？

May 15 '24 09:05 Gy1900

congqixia

@congqixia congqixia

May 15 '24 09:05 Gy1900

@Gy1900 Could you double check the bookie memory setting? It's recommended to set 4G as heap and 8G for direct memory like this in the pulsar-bookie configmap:

  PULSAR_MEM: |
    -Xms4096m -Xmx4096m -XX:MaxDirectMemorySize=8192m

You can change the configmap then restart the pulsar bookie pods one by one.

May 15 '24 09:05 LoveEachDay

@Gy1900你能仔细检查一下 bookie 内存设置吗？建议在 pulsar-bookie configmap 中将 4G 设置为堆，将 8G 设置为直接内存：
  PULSAR_MEM: |
    -Xms4096m -Xmx4096m -XX:MaxDirectMemorySize=8192m
您可以更改 configmap，然后一一重启 pulsar bookie pod。

good idea , i will try, 3q

May 15 '24 09:05 Gy1900

I'd close this issue, please free to file a new one

Jun 26 '24 07:06 yanliang567

milvus milvus copied to clipboard

[Bug]: inserting 1000 documents, milvus crashes and unavailable.

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard