milvus
milvus copied to clipboard
[Bug]: [benchmark][standalone]Milvus hangs during the insertion of 50m of data
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20220920-5141e05c
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0dev30
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
server-instance fouram-cron-1663689600-6 server-configmap server-single-16c64m client-configmap client-search-sift50m-ivf-flat-2048
server
NAME READY STATUS RESTARTS AGE IP NOD
E NOMINATED NODE READINESS GATES
fouram-cron-1663689600-6-etcd-0 1/1 Running 0 2m20s 10.104.9.248 4am
-node14 <none> <none>
fouram-cron-1663689600-6-milvus-standalone-6d6bf8985d-vwlrw 1/1 Running 0 2m21s 10.104.1.111 4am
-node10 <none> <none>
fouram-cron-1663689600-6-minio-785dc8d76c-w5485 1/1 Running 0 2m20s 10.104.9.230 4am
-node14 <none> <none>
client:
[2022-09-20 16:32:30,712] [ DEBUG] - 38300000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:31,085] [ DEBUG] - Start id: 38350000, end id: 38400000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:33,200] [ DEBUG] - Milvus insert run in 2.1139s (milvus_benchmark.client:57)
[2022-09-20 16:32:33,203] [ DEBUG] - Row count: 38300000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:33,203] [ DEBUG] - 38300000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:33,850] [ DEBUG] - Start id: 38400000, end id: 38450000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:36,410] [ DEBUG] - Milvus insert run in 2.5582s (milvus_benchmark.client:57)
[2022-09-20 16:32:36,414] [ DEBUG] - Row count: 38400000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:36,414] [ DEBUG] - 38400000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:37,115] [ DEBUG] - Start id: 38450000, end id: 38500000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:38,912] [ DEBUG] - Milvus insert run in 1.7955s (milvus_benchmark.client:57)
[2022-09-20 16:32:38,980] [ DEBUG] - Row count: 38400000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:38,980] [ DEBUG] - 38400000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:39,684] [ DEBUG] - Start id: 38500000, end id: 38550000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:41,365] [ DEBUG] - Milvus insert run in 1.6787s (milvus_benchmark.client:57)
[2022-09-20 16:32:41,386] [ DEBUG] - Row count: 38450135 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:41,386] [ DEBUG] - 38450135 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:42,043] [ DEBUG] - Start id: 38550000, end id: 38600000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:43,910] [ DEBUG] - Milvus insert run in 1.8652s (milvus_benchmark.client:57)
[2022-09-20 16:32:43,912] [ DEBUG] - Row count: 38500000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:43,912] [ DEBUG] - 38500000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:44,647] [ DEBUG] - Start id: 38600000, end id: 38650000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:46,709] [ DEBUG] - Milvus insert run in 2.0598s (milvus_benchmark.client:57)
[2022-09-20 16:32:46,779] [ DEBUG] - Row count: 38574862 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:46,780] [ DEBUG] - 38574862 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:47,529] [ DEBUG] - Start id: 38650000, end id: 38700000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:49,316] [ DEBUG] - Milvus insert run in 1.7846s (milvus_benchmark.client:57)
[2022-09-20 16:32:49,590] [ DEBUG] - Row count: 38574862 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:49,590] [ DEBUG] - 38574862 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:50,168] [ DEBUG] - Start id: 38700000, end id: 38750000 (milvus_benchmark.runners.base:76)
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
/assign @soothing-rain /unassign
From the milvus log, it seems that milvus hangs in AssignSegmentID
.
[2022/09/20 16:32:51.724 +00:00] [INFO] [datacoord/services.go:146] ["handle assign segment request"] [collectionID=436126291330272580] [partitionID=436126291330272581] [channelName=by-dev-rootcoord-dml_1_436126291330272580v1] [count=25017] [isImport=false] ["import task ID"=0]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:300] ["Proxy segIDAssigner reduceSegReqs after reduce"] [len(segReqs)=1] [BeforeCnt=25017] [AfterCnt=25017]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:270] ["Proxy segIDAssigner reduceSegReqs"] [len(segReqs)=1]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:230] ["Proxy segIDAssigner pickCanDoFunc"] [records="{\"436126291330272580\":{\"436126291330272581\":{\"by-dev-rootcoord-dml_1_436126291330272580v1\":25017}}}"] [len(newTodoReqs)=1] [len(CanDoReqs)=0]
[2022/09/20 16:32:51.715 +00:00] [INFO] [proxy/task_insert.go:447] ["send insert request to virtual channels"] [collection=sift_50m_128_l2] [partition=_default] [collection_id=436126291330272580] [partition_id=436126291330272581] [virtual_channels="[by-dev-rootcoord-dml_0_436126291330272580v0,by-dev-rootcoord-dml_1_436126291330272580v1]"] [task_id=436126291381634258]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [timerecord/time_recorder.go:91] ["proxy execute insert 436126291381634258: get used message stream (0ms)"]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [timerecord/time_recorder.go:91] ["proxy execute insert 436126291381634258: get collection id & partition id from cache (0ms)"]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [proxy/task_insert.go:247] ["Proxy Insert PreExecute done"] [msgID=436126291381634258] ["collection name"=sift_50m_128_l2]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=50000] [total=200000] [remainReqCnt=0]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=50000] [total=49943] [remainReqCnt=1]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [proxy/impl.go:2350] ["Detail of insert request in Proxy"] [role=proxy] [msgID=436126291381634258] [BeginTS=436126763365695489] [EndTS=436126763365695489] [db=] [collection=sift_50m_128_l2] [partition=_default] [NumRows=50000] [traceID=19ab52d1de83fce1]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=49944] [remainReqCnt=0]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [proxy/impl.go:2333] ["Enqueue insert request in Proxy"] [role=proxy] [db=] [collection=sift_50m_128_l2] [partition=_default] [len(FieldsData)=2] [len(HashKeys)=0] [NumRows=50000] [traceID=19ab52d1de83fce1]
[2022/09/20 16:32:51.714 +00:00] [INFO] [proxy/impl.go:2266] ["Start processing insert request in Proxy"] [traceID=19ab52d1de83fce1]
server-instance fouram-cron-1663776000-8 server-configmap server-single-8c16m client-configmap client-insert-bp-batch servre
NAME
READY STATUS RESTARTS AGE IP NODE
NOMINATED NODE READINESS GATES
fouram-cron-1663776000-8-etcd-0 1/1 Running 0 2m24s 10.104.9.154 4am-nod
e14 <none> <none>
fouram-cron-1663776000-8-milvus-standalone-5b6f4768fd-xzw5m 1/1 Running 0 2m24s 10.104.5.215 4am-nod
e12 <none> <none>
fouram-cron-1663776000-8-minio-78cc8b68dd-86d64 1/1 Running 0 2m24s 10.104.6.209 4am-nod
e13 <none> <none>
[2022-09-21 16:10:38,545] [ DEBUG] - {'type': 'bp_insert_performance', 'value': {'total_time': 17.5799, 'rps': 56883.1125, 'ni_tim
e': 3.516, 'flush_time': 1.5455, 'build_time': 0.0}} (milvus_benchmark.main:107)
[2022-09-21 16:10:38,545] [ DEBUG] - {'_version': '0.1', '_type': 'case', 'run_id': 1663780562, 'mode': 'local', 'server': <milvus
_benchmark.metrics.models.server.Server object at 0x7f358a346fd0>, 'hardware': <milvus_benchmark.metrics.models.hardware.Hardware ob
ject at 0x7f358a346f10>, 'env': <milvus_benchmark.metrics.models.env.Env object at 0x7f358a346e80>, 'status': 'RUN_SUCC', 'err_messa
ge': '', 'collection': {'dimension': 128, 'metric_type': 'l2', 'dataset_name': 'sift_1m_128_l2', 'collection_size': 1000000, 'other_
fields': None, 'ni_per': 200000, 'shards_num': None}, 'index': None, 'search': None, 'run_params': None, 'metrics': {'type': 'bp_ins
ert_performance', 'value': {'total_time': 17.5799, 'rps': 56883.1125, 'ni_time': 3.516, 'flush_time': 1.5455, 'build_time': 0.0}}, '
datetime': '2022-09-21 16:04:20.375321', 'type': 'metric'} (milvus_benchmark.metric.api:30)
[2022-09-21 16:10:38,553] [ DEBUG] - Milvus get_info run in 0.0021s (milvus_benchmark.client:57)
[2022-09-21 16:10:40,460] [ DEBUG] - Start id: 0, end id: 200000 (milvus_benchmark.runners.base:76)
[2022-09-21 16:10:44,031] [ DEBUG] - Milvus insert run in 3.563s (milvus_benchmark.client:57)
[2022-09-21 16:10:44,033] [ DEBUG] - Row count: 11800000 in collection: <sift_1m_128_l2> (milvus_benchmark.client:520)
[2022-09-21 16:10:44,033] [ DEBUG] - 11800000 (milvus_benchmark.runners.base:90)
[2022-09-21 16:10:46,236] [ DEBUG] - Start id: 200000, end id: 400000 (milvus_benchmark.runners.base:76)
[2022-09-21 16:10:49,851] [ DEBUG] - Milvus insert run in 3.6067s (milvus_benchmark.client:57)
[2022-09-21 16:10:49,853] [ DEBUG] - Row count: 12000000 in collection: <sift_1m_128_l2> (milvus_benchmark.client:520)
[2022-09-21 16:10:49,853] [ DEBUG] - 12000000 (milvus_benchmark.runners.base:90)
[2022-09-21 16:10:51,905] [ DEBUG] - Start id: 400000, end id: 600000 (milvus_benchmark.runners.base:76)
server-instance fouram-xwwjv-1 server-configmap server-single-8c16m client-configmap client-insert-bp-batch
milvus:master-20220929-64662919 pymilvus :2.2.0dev32
This problem should be occasional and was verified yesterday that there was no problem with the change