milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster][LRU] search and query failed in dml & dql scene

Open wangting0128 opened this issue 9 months ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: milvus-io-lru-dev-9234a94-20240506
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):  pulsar  
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: lru-fouramf-d5jpz

server:

NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-verify-32135-cluster-etcd-0                                   1/1     Running       0               5m      10.104.19.173   4am-node28   <none>           <none>
lru-verify-32135-cluster-etcd-1                                   1/1     Running       0               5m      10.104.15.190   4am-node20   <none>           <none>
lru-verify-32135-cluster-etcd-2                                   1/1     Running       0               5m      10.104.18.232   4am-node25   <none>           <none>
lru-verify-32135-cluster-milvus-datacoord-5789ccf96f-4xk7b        1/1     Running       0               5m1s    10.104.24.132   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-datanode-7447559b8b-2zgc9         1/1     Running       0               5m1s    10.104.24.135   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-indexcoord-5d69fb9db8-2hgfl       1/1     Running       0               5m1s    10.104.24.136   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-5xb82        1/1     Running       0               5m1s    10.104.30.107   4am-node38   <none>           <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-9tbh9        1/1     Running       0               5m1s    10.104.17.37    4am-node23   <none>           <none>
lru-verify-32135-cluster-milvus-proxy-767bf99dd-7flpv             1/1     Running       0               5m1s    10.104.24.131   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-querycoord-8477cbc647-jvmrt       1/1     Running       0               5m1s    10.104.24.134   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-querynode-856df586fb-tf77v        1/1     Running       0               5m1s    10.104.1.169    4am-node10   <none>           <none>
lru-verify-32135-cluster-milvus-rootcoord-7c78f55b49-f9qcv        1/1     Running       0               5m1s    10.104.24.130   4am-node29   <none>           <none>
lru-verify-32135-cluster-minio-0                                  1/1     Running       0               5m      10.104.19.172   4am-node28   <none>           <none>
lru-verify-32135-cluster-minio-1                                  1/1     Running       0               5m      10.104.15.191   4am-node20   <none>           <none>
lru-verify-32135-cluster-minio-2                                  1/1     Running       0               5m      10.104.24.142   4am-node29   <none>           <none>
lru-verify-32135-cluster-minio-3                                  1/1     Running       0               5m      10.104.18.233   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-0                          1/1     Running       0               5m      10.104.18.230   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-1                          1/1     Running       0               5m      10.104.19.174   4am-node28   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-2                          1/1     Running       0               5m      10.104.15.194   4am-node20   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-init-49mgf                 0/1     Completed     0               5m1s    10.104.24.133   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-broker-0                          1/1     Running       0               5m1s    10.104.6.6      4am-node13   <none>           <none>
lru-verify-32135-cluster-pulsar-proxy-0                           1/1     Running       0               5m1s    10.104.19.167   4am-node28   <none>           <none>
lru-verify-32135-cluster-pulsar-pulsar-init-njn79                 0/1     Completed     0               5m1s    10.104.24.140   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-recovery-0                        1/1     Running       0               5m1s    10.104.5.168    4am-node12   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-0                       1/1     Running       0               5m1s    10.104.18.229   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-1                       1/1     Running       0               4m22s   10.104.24.144   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-2                       1/1     Running       0               3m46s   10.104.15.196   4am-node20   <none>           <none> (base.py:257)
[2024-05-06 18:22:08,058 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|lru-verify-32135-cluster-milvus|lru-verify-32135-cluster-minio|lru-verify-32135-cluster-etcd|lru-verify-32135-cluster-pulsar|lru-verify-32135-cluster-zookeeper|lru-verify-32135-cluster-kafka|lru-verify-32135-cluster-log|lru-verify-32135-cluster-tikv'  (util_cmd.py:14)
[2024-05-06 18:22:19,029 -  INFO - fouram]: [CliClient] pod details of release(lru-verify-32135-cluster): 
 I0506 18:22:09.704805     517 request.go:665] Waited for 1.19818961s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/batch/v1beta1?timeout=32s
NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-verify-32135-cluster-etcd-0                                   1/1     Running       0               12h     10.104.19.173   4am-node28   <none>           <none>
lru-verify-32135-cluster-etcd-1                                   1/1     Running       0               12h     10.104.15.190   4am-node20   <none>           <none>
lru-verify-32135-cluster-etcd-2                                   1/1     Running       0               12h     10.104.18.232   4am-node25   <none>           <none>
lru-verify-32135-cluster-milvus-datacoord-5789ccf96f-4xk7b        1/1     Running       0               12h     10.104.24.132   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-datanode-7447559b8b-2zgc9         1/1     Running       0               12h     10.104.24.135   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-indexcoord-5d69fb9db8-2hgfl       1/1     Running       0               12h     10.104.24.136   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-5xb82        1/1     Running       0               12h     10.104.30.107   4am-node38   <none>           <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-9tbh9        1/1     Running       0               12h     10.104.17.37    4am-node23   <none>           <none>
lru-verify-32135-cluster-milvus-proxy-767bf99dd-7flpv             1/1     Running       0               12h     10.104.24.131   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-querycoord-8477cbc647-jvmrt       1/1     Running       0               12h     10.104.24.134   4am-node29   <none>           <none>
lru-verify-32135-cluster-milvus-querynode-856df586fb-tf77v        1/1     Running       20 (25m ago)    12h     10.104.1.169    4am-node10   <none>           <none>
lru-verify-32135-cluster-milvus-rootcoord-7c78f55b49-f9qcv        1/1     Running       0               12h     10.104.24.130   4am-node29   <none>           <none>
lru-verify-32135-cluster-minio-0                                  1/1     Running       0               12h     10.104.19.172   4am-node28   <none>           <none>
lru-verify-32135-cluster-minio-1                                  1/1     Running       0               12h     10.104.15.191   4am-node20   <none>           <none>
lru-verify-32135-cluster-minio-2                                  1/1     Running       0               12h     10.104.24.142   4am-node29   <none>           <none>
lru-verify-32135-cluster-minio-3                                  1/1     Running       0               12h     10.104.18.233   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-0                          1/1     Running       0               12h     10.104.18.230   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-1                          1/1     Running       0               12h     10.104.19.174   4am-node28   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-2                          1/1     Running       0               12h     10.104.15.194   4am-node20   <none>           <none>
lru-verify-32135-cluster-pulsar-bookie-init-49mgf                 0/1     Completed     0               12h     10.104.24.133   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-broker-0                          1/1     Running       0               12h     10.104.6.6      4am-node13   <none>           <none>
lru-verify-32135-cluster-pulsar-proxy-0                           1/1     Running       0               12h     10.104.19.167   4am-node28   <none>           <none>
lru-verify-32135-cluster-pulsar-pulsar-init-njn79                 0/1     Completed     0               12h     10.104.24.140   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-recovery-0                        1/1     Running       0               12h     10.104.5.168    4am-node12   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-0                       1/1     Running       0               12h     10.104.18.229   4am-node25   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-1                       1/1     Running       0               12h     10.104.24.144   4am-node29   <none>           <none>
lru-verify-32135-cluster-pulsar-zookeeper-2                       1/1     Running       0               12h     10.104.15.196   4am-node20   <none>           <none>

client pod name: lru-fouramf-d5jpz-3796633683 client log:

[2024-05-06 13:05:44,194 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=65535, message=fail to Query on QueryNode 21: worker(21) query failed: Assert "is_system_field_ready()" at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:1030
[2024-05-06 13:06:02,560 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=65535, message=fail to search on QueryNode 21: worker(21) query failed:  => failed to load row ID or timestamp, potential missing bin logs or empty segments. Segment ID = 449570705034082594)>, <Time:{'RPC start': '2024-05-06 13:06:01.943796', 'RPC error': '2024-05-06 13:06:02.560093'}> (decorators.py:146)

test result:

[2024-05-06 18:20:48,147 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: grpc     delete                                                                        562232   279(0.05%) |     40       1    5274     16 |   13.01        0.01 (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: grpc     insert                                                                        561829   281(0.05%) |    219      23   12583    180 |   13.01        0.01 (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: grpc     load                                                                          561432     0(0.00%) |     49       4    6101     25 |   13.00        0.00 (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: grpc     query                                                                         560719 57974(10.34%) |    230       1  162124    110 |   12.98        1.34 (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: grpc     search                                                                        561121 57551(10.26%) |    218       5  162326     91 |   12.99        1.33 (stats.py:789)
[2024-05-06 18:20:48,147 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-06 18:20:48,148 -  INFO - fouram]:          Aggregated                                                                   2807333 116085(4.14%) |    151       1  162326     82 |   64.98        2.69 (stats.py:789)
[2024-05-06 18:20:48,148 -  INFO - fouram]:  (stats.py:790)
[2024-05-06 18:20:48,150 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_8c16m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '2',
                                                              'memory': '8Gi',
                                                              'ephemeral-storage': '70Gi'},
                                                   'requests': {'cpu': '2',
                                                                'memory': '8Gi'}},
                                     'replicas': 1,
                                     'extraEnv': [{'name': 'LOCAL_STORAGE_SIZE',
                                                   'value': '70'}]},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '8Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '5Gi'}},
                                     'replicas': 2},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '8Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '8Gi'}},
                                    'replicas': 1},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}},
                                 'persistence': {'size': '320Gi'}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'extraConfigFiles': {'user.yaml': 'queryNode:\n'
                                                         '  '
                                                         'diskCacheCapacityLimit: '
                                                         '51539607552\n'
                                                         '  mmap:\n'
                                                         '    mmapEnabled: '
                                                         'true\n'
                                                         '  lazyloadEnabled: '
                                                         'true\n'
                                                         '  '
                                                         'useStreamComputing: '
                                                         'true\n'
                                                         '  cache:\n'
                                                         '    warmup: sync\n'
                                                         '  '
                                                         'lazyloadWaitTimeout: '
                                                         '300000\n'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'milvus-io-lru-dev-9234a94-20240506'}}},
            'host': 'lru-verify-32135-cluster-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '10w',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'num_partitions': 64},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 10,
                                                       'during_time': '12h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 64,
                                                                  'timeout': 3000,
                                                                  'random_vector': True}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'delete_length': 64,
                                                                  'timeout': 3000}},
                                                      {'type': 'flush',
                                                       'weight': 0,
                                                       'params': {'timeout': 3000}},
                                                      {'type': 'load',
                                                       'weight': 1,
                                                       'params': {'timeout': 3000}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 10,
                                                                  'search_param': {'ef': 64},
                                                                  'expr': 'int64_1 '
                                                                          '>= '
                                                                          '0 '
                                                                          '&& '
                                                                          'int64_1 '
                                                                          '<= '
                                                                          '4',
                                                                  'timeout': 3000,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'expr': 'int64_1 '
                                                                          '!= '
                                                                          '2',
                                                                  'timeout': 3000,
                                                                  'offset': 0,
                                                                  'limit': 10}}]},
            'run_id': 2024050660597437,
            'datetime': '2024-05-06 06:14:19.839669',
            'client_version': '2.2'},
 'result': {'test_result': {'index': {'RT': 35.7288},
                            'insert': {'total_time': 25.8054,
                                       'VPS': 3875.1579,
                                       'batch_time': 2.5805,
                                       'batch': 10000},
                            'flush': {'RT': 5.1085},
                            'load': {'RT': 4.6199},
                            'Locust': {'Aggregated': {'Requests': 2807333,
                                                      'Fails': 116085,
                                                      'RPS': 64.98,
                                                      'fail_s': 0.04,
                                                      'RT_max': 162326.53,
                                                      'RT_avg': 151.49,
                                                      'TP50': 82,
                                                      'TP99': 800.0},
                                       'delete': {'Requests': 562232,
                                                  'Fails': 279,
                                                  'RPS': 13.01,
                                                  'fail_s': 0.0,
                                                  'RT_max': 5274.16,
                                                  'RT_avg': 40.11,
                                                  'TP50': 16,
                                                  'TP99': 370.0},
                                       'insert': {'Requests': 561829,
                                                  'Fails': 281,
                                                  'RPS': 13.01,
                                                  'fail_s': 0.0,
                                                  'RT_max': 12583.25,
                                                  'RT_avg': 219.05,
                                                  'TP50': 180.0,
                                                  'TP99': 1100.0},
                                       'load': {'Requests': 561432,
                                                'Fails': 0,
                                                'RPS': 13.0,
                                                'fail_s': 0.0,
                                                'RT_max': 6101.14,
                                                'RT_avg': 49.95,
                                                'TP50': 25,
                                                'TP99': 220.0},
                                       'query': {'Requests': 560719,
                                                 'Fails': 57974,
                                                 'RPS': 12.98,
                                                 'fail_s': 0.1,
                                                 'RT_max': 162124.56,
                                                 'RT_avg': 230.17,
                                                 'TP50': 110.0,
                                                 'TP99': 1100.0},
                                       'search': {'Requests': 561121,
                                                  'Fails': 57551,
                                                  'RPS': 12.99,
                                                  'fail_s': 0.1,
                                                  'RT_max': 162326.53,
                                                  'RT_avg': 218.43,
                                                  'TP50': 91,
                                                  'TP99': 1100.0}}}}}

Expected Behavior

No response

Steps To Reproduce

1. create a collection with 3 fields: id(primaryKey, autoID), float_vector(768dim), int64_1(partitionKey=64)
2. build HNSW index
3. prepare 10w data
4. flush collection
5. build index again with the same params
6. load collection
7. concurrent requests:
   - insert
   - delete
   - load
   - search
   - query

Milvus Log

No response

Anything else?

No response

wangting0128 avatar May 07 '24 03:05 wangting0128

@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。

mkdir700 avatar Apr 06 '23 09:04 mkdir700

@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。

可以的,但是我不知道目前 gpt-4 的模型的名字

另外模型那里输入框,并不是选择框,如果拿到了 gpt4 权限的话,可以自己输入模型

lblblong avatar Apr 06 '23 09:04 lblblong

@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。

可以的,但是我不知道目前 gpt-4 的模型的名字

另外模型那里输入框,并不是选择框,如果拿到了 gpt4 权限的话,可以自己输入模型

可以将下拉时显示的模型改成网络请求全部可支持模型,并将常用的3.5模型置顶并添加【推荐】字样的显示

by123456by avatar May 03 '23 15:05 by123456by