milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][standalone]Milvus hangs during the insertion of 50m of data

Open jingkl opened this issue 2 years ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:master-20220920-5141e05c
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0dev30
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance fouram-cron-1663689600-6 server-configmap server-single-16c64m client-configmap client-search-sift50m-ivf-flat-2048

server

NAME                                                              READY   STATUS            RESTARTS      AGE     IP             NOD
E         NOMINATED NODE   READINESS GATES
fouram-cron-1663689600-6-etcd-0                                   1/1     Running           0             2m20s   10.104.9.248   4am
-node14   <none>           <none>
fouram-cron-1663689600-6-milvus-standalone-6d6bf8985d-vwlrw       1/1     Running           0             2m21s   10.104.1.111   4am
-node10   <none>           <none>
fouram-cron-1663689600-6-minio-785dc8d76c-w5485                   1/1     Running           0             2m20s   10.104.9.230   4am
-node14   <none>           <none>

client:

[2022-09-20 16:32:30,712] [   DEBUG] - 38300000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:31,085] [   DEBUG] - Start id: 38350000, end id: 38400000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:33,200] [   DEBUG] - Milvus insert run in 2.1139s (milvus_benchmark.client:57)
[2022-09-20 16:32:33,203] [   DEBUG] - Row count: 38300000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:33,203] [   DEBUG] - 38300000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:33,850] [   DEBUG] - Start id: 38400000, end id: 38450000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:36,410] [   DEBUG] - Milvus insert run in 2.5582s (milvus_benchmark.client:57)
[2022-09-20 16:32:36,414] [   DEBUG] - Row count: 38400000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:36,414] [   DEBUG] - 38400000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:37,115] [   DEBUG] - Start id: 38450000, end id: 38500000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:38,912] [   DEBUG] - Milvus insert run in 1.7955s (milvus_benchmark.client:57)
[2022-09-20 16:32:38,980] [   DEBUG] - Row count: 38400000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:38,980] [   DEBUG] - 38400000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:39,684] [   DEBUG] - Start id: 38500000, end id: 38550000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:41,365] [   DEBUG] - Milvus insert run in 1.6787s (milvus_benchmark.client:57)
[2022-09-20 16:32:41,386] [   DEBUG] - Row count: 38450135 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:41,386] [   DEBUG] - 38450135 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:42,043] [   DEBUG] - Start id: 38550000, end id: 38600000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:43,910] [   DEBUG] - Milvus insert run in 1.8652s (milvus_benchmark.client:57)
[2022-09-20 16:32:43,912] [   DEBUG] - Row count: 38500000 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:43,912] [   DEBUG] - 38500000 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:44,647] [   DEBUG] - Start id: 38600000, end id: 38650000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:46,709] [   DEBUG] - Milvus insert run in 2.0598s (milvus_benchmark.client:57)
[2022-09-20 16:32:46,779] [   DEBUG] - Row count: 38574862 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:46,780] [   DEBUG] - 38574862 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:47,529] [   DEBUG] - Start id: 38650000, end id: 38700000 (milvus_benchmark.runners.base:76)
[2022-09-20 16:32:49,316] [   DEBUG] - Milvus insert run in 1.7846s (milvus_benchmark.client:57)
[2022-09-20 16:32:49,590] [   DEBUG] - Row count: 38574862 in collection: <sift_50m_128_l2> (milvus_benchmark.client:520)
[2022-09-20 16:32:49,590] [   DEBUG] - 38574862 (milvus_benchmark.runners.base:90)
[2022-09-20 16:32:50,168] [   DEBUG] - Start id: 38700000, end id: 38750000 (milvus_benchmark.runners.base:76)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

jingkl avatar Sep 21 '22 03:09 jingkl

/assign @soothing-rain /unassign

yanliang567 avatar Sep 21 '22 04:09 yanliang567

From the milvus log, it seems that milvus hangs in AssignSegmentID.

[2022/09/20 16:32:51.724 +00:00] [INFO] [datacoord/services.go:146] ["handle assign segment request"] [collectionID=436126291330272580] [partitionID=436126291330272581] [channelName=by-dev-rootcoord-dml_1_436126291330272580v1] [count=25017] [isImport=false] ["import task ID"=0]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:300] ["Proxy segIDAssigner reduceSegReqs after reduce"] [len(segReqs)=1] [BeforeCnt=25017] [AfterCnt=25017]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:270] ["Proxy segIDAssigner reduceSegReqs"] [len(segReqs)=1]
[2022/09/20 16:32:51.724 +00:00] [DEBUG] [proxy/segment.go:230] ["Proxy segIDAssigner pickCanDoFunc"] [records="{\"436126291330272580\":{\"436126291330272581\":{\"by-dev-rootcoord-dml_1_436126291330272580v1\":25017}}}"] [len(newTodoReqs)=1] [len(CanDoReqs)=0]
[2022/09/20 16:32:51.715 +00:00] [INFO] [proxy/task_insert.go:447] ["send insert request to virtual channels"] [collection=sift_50m_128_l2] [partition=_default] [collection_id=436126291330272580] [partition_id=436126291330272581] [virtual_channels="[by-dev-rootcoord-dml_0_436126291330272580v0,by-dev-rootcoord-dml_1_436126291330272580v1]"] [task_id=436126291381634258]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [timerecord/time_recorder.go:91] ["proxy execute insert 436126291381634258: get used message stream (0ms)"]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [timerecord/time_recorder.go:91] ["proxy execute insert 436126291381634258: get collection id & partition id from cache (0ms)"]
[2022/09/20 16:32:51.715 +00:00] [DEBUG] [proxy/task_insert.go:247] ["Proxy Insert PreExecute done"] [msgID=436126291381634258] ["collection name"=sift_50m_128_l2]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=50000] [total=200000] [remainReqCnt=0]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=50000] [total=49943] [remainReqCnt=1]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [proxy/impl.go:2350] ["Detail of insert request in Proxy"] [role=proxy] [msgID=436126291381634258] [BeginTS=436126763365695489] [EndTS=436126763365695489] [db=] [collection=sift_50m_128_l2] [partition=_default] [NumRows=50000] [traceID=19ab52d1de83fce1]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [allocator/id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=49944] [remainReqCnt=0]
[2022/09/20 16:32:51.714 +00:00] [DEBUG] [proxy/impl.go:2333] ["Enqueue insert request in Proxy"] [role=proxy] [db=] [collection=sift_50m_128_l2] [partition=_default] [len(FieldsData)=2] [len(HashKeys)=0] [NumRows=50000] [traceID=19ab52d1de83fce1]
[2022/09/20 16:32:51.714 +00:00] [INFO] [proxy/impl.go:2266] ["Start processing insert request in Proxy"] [traceID=19ab52d1de83fce1]

LoveEachDay avatar Sep 21 '22 06:09 LoveEachDay

server-instance fouram-cron-1663776000-8 server-configmap server-single-8c16m client-configmap client-insert-bp-batch servre

NAME             
                                               READY   STATUS      RESTARTS        AGE     IP             NODE
     NOMINATED NODE   READINESS GATES
fouram-cron-1663776000-8-etcd-0                                   1/1     Running     0               2m24s   10.104.9.154   4am-nod
e14   <none>           <none>
fouram-cron-1663776000-8-milvus-standalone-5b6f4768fd-xzw5m       1/1     Running     0               2m24s   10.104.5.215   4am-nod
e12   <none>           <none>
fouram-cron-1663776000-8-minio-78cc8b68dd-86d64                   1/1     Running     0               2m24s   10.104.6.209   4am-nod
e13   <none>           <none>
[2022-09-21 16:10:38,545] [   DEBUG] - {'type': 'bp_insert_performance', 'value': {'total_time': 17.5799, 'rps': 56883.1125, 'ni_tim
e': 3.516, 'flush_time': 1.5455, 'build_time': 0.0}} (milvus_benchmark.main:107)
[2022-09-21 16:10:38,545] [   DEBUG] - {'_version': '0.1', '_type': 'case', 'run_id': 1663780562, 'mode': 'local', 'server': <milvus
_benchmark.metrics.models.server.Server object at 0x7f358a346fd0>, 'hardware': <milvus_benchmark.metrics.models.hardware.Hardware ob
ject at 0x7f358a346f10>, 'env': <milvus_benchmark.metrics.models.env.Env object at 0x7f358a346e80>, 'status': 'RUN_SUCC', 'err_messa
ge': '', 'collection': {'dimension': 128, 'metric_type': 'l2', 'dataset_name': 'sift_1m_128_l2', 'collection_size': 1000000, 'other_
fields': None, 'ni_per': 200000, 'shards_num': None}, 'index': None, 'search': None, 'run_params': None, 'metrics': {'type': 'bp_ins
ert_performance', 'value': {'total_time': 17.5799, 'rps': 56883.1125, 'ni_time': 3.516, 'flush_time': 1.5455, 'build_time': 0.0}}, '
datetime': '2022-09-21 16:04:20.375321', 'type': 'metric'} (milvus_benchmark.metric.api:30)
[2022-09-21 16:10:38,553] [   DEBUG] - Milvus get_info run in 0.0021s (milvus_benchmark.client:57)
[2022-09-21 16:10:40,460] [   DEBUG] - Start id: 0, end id: 200000 (milvus_benchmark.runners.base:76)
[2022-09-21 16:10:44,031] [   DEBUG] - Milvus insert run in 3.563s (milvus_benchmark.client:57)
[2022-09-21 16:10:44,033] [   DEBUG] - Row count: 11800000 in collection: <sift_1m_128_l2> (milvus_benchmark.client:520)
[2022-09-21 16:10:44,033] [   DEBUG] - 11800000 (milvus_benchmark.runners.base:90)
[2022-09-21 16:10:46,236] [   DEBUG] - Start id: 200000, end id: 400000 (milvus_benchmark.runners.base:76)
[2022-09-21 16:10:49,851] [   DEBUG] - Milvus insert run in 3.6067s (milvus_benchmark.client:57)
[2022-09-21 16:10:49,853] [   DEBUG] - Row count: 12000000 in collection: <sift_1m_128_l2> (milvus_benchmark.client:520)
[2022-09-21 16:10:49,853] [   DEBUG] - 12000000 (milvus_benchmark.runners.base:90)
[2022-09-21 16:10:51,905] [   DEBUG] - Start id: 400000, end id: 600000 (milvus_benchmark.runners.base:76)

jingkl avatar Sep 22 '22 02:09 jingkl

server-instance fouram-xwwjv-1 server-configmap server-single-8c16m client-configmap client-insert-bp-batch

milvus:master-20220929-64662919 pymilvus :2.2.0dev32

This problem should be occasional and was verified yesterday that there was no problem with the change

jingkl avatar Sep 30 '22 01:09 jingkl