milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster] Replace compacted segment has been unsuccessful

Open wangting0128 opened this issue 11 months ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.4-20240320-7abebf81-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: inverted-corn-dn7xt test case name: test_inverted_locust_varchar_dql_cluster

server:

NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-dn7xt-4-40-2574-etcd-0                              1/1     Running            0                7h48m   10.104.34.52    4am-node37   <none>           <none>
inverted-corn-dn7xt-4-40-2574-etcd-1                              1/1     Running            0                7h48m   10.104.32.68    4am-node39   <none>           <none>
inverted-corn-dn7xt-4-40-2574-etcd-2                              1/1     Running            0                7h48m   10.104.33.253   4am-node36   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-datacoord-77c799c69c-p8dxt   1/1     Running            0                7h48m   10.104.19.72    4am-node28   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-datanode-7888f649fc-mjxs4    1/1     Running            1 (7h44m ago)    7h48m   10.104.19.70    4am-node28   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-indexcoord-79694548-pljnm    1/1     Running            0                7h48m   10.104.18.159   4am-node25   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-indexnode-6c79487db7-rwr7v   1/1     Running            0                7h48m   10.104.34.30    4am-node37   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-proxy-658bfd6f4b-wzvwm       1/1     Running            1 (7h44m ago)    7h48m   10.104.19.71    4am-node28   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-querycoord-5dcf95d959m7mkc   1/1     Running            1 (7h44m ago)    7h48m   10.104.18.161   4am-node25   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-querynode-64c859bfb8-q645n   1/1     Running            0                7h48m   10.104.27.152   4am-node31   <none>           <none>
inverted-corn-dn7xt-4-40-2574-milvus-rootcoord-7bdb59f749-ldj7h   1/1     Running            1 (7h44m ago)    7h48m   10.104.18.160   4am-node25   <none>           <none>
inverted-corn-dn7xt-4-40-2574-minio-0                             1/1     Running            0                7h48m   10.104.34.51    4am-node37   <none>           <none>
inverted-corn-dn7xt-4-40-2574-minio-1                             1/1     Running            0                7h48m   10.104.33.248   4am-node36   <none>           <none>
inverted-corn-dn7xt-4-40-2574-minio-2                             1/1     Running            0                7h48m   10.104.32.73    4am-node39   <none>           <none>
inverted-corn-dn7xt-4-40-2574-minio-3                             1/1     Running            0                7h48m   10.104.21.28    4am-node24   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-0                     1/1     Running            0                7h48m   10.104.21.27    4am-node24   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-1                     1/1     Running            0                7h48m   10.104.33.254   4am-node36   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-2                     1/1     Running            0                7h48m   10.104.32.74    4am-node39   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-init-24fz6            0/1     Completed          0                7h48m   10.104.30.115   4am-node38   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-broker-0                     1/1     Running            0                7h48m   10.104.30.117   4am-node38   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-proxy-0                      1/1     Running            0                7h48m   10.104.1.243    4am-node10   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-pulsar-init-856ld            0/1     Completed          0                7h48m   10.104.30.114   4am-node38   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-recovery-0                   1/1     Running            0                7h48m   10.104.32.59    4am-node39   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-0                  1/1     Running            0                7h48m   10.104.33.252   4am-node36   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-1                  1/1     Running            0                7h47m   10.104.30.134   4am-node38   <none>           <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-2                  1/1     Running            0                7h46m   10.104.32.85    4am-node39   <none>           <none>

After creating the index, the new segment cannot be loaded on the queryNode,queryNode memory has 64G image 273819cd-2eb5-41ef-96ce-9890f372d7a5 截屏2024-03-21 17 03 56

client pod name: inverted-corn-dn7xt-291792770 client logs:

image

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `varchar: different max_length`
            verify concurrent DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index

        :test steps:
            1. create collection with fields:
                'float_vector': 3dim,
                'varchar_1': max_length=256, varchar_filled=True
                'varchar_2': max_length=32768, varchar_filled=True
                'varchar_3': max_length=65535, varchar_filled=True
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'varchar_1', 'varchar_2', 'varchar_3'
            3. insert 300k data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - search
                - query

Milvus Log

No response

Anything else?

test result:

[2024-03-20 17:44:58,699 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-20 17:44:58,700 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]: grpc     query                                                                           8416     0(0.00%) |  16985    2463   38727  15000 |    2.34        0.00 (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]: grpc     search                                                                          8524     0(0.00%) |   4259    2157   12464   3700 |    2.37        0.00 (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]:          Aggregated                                                                     16940     0(0.00%) |  10581    2157   38727   6900 |    4.71        0.00 (stats.py:789)
[2024-03-20 17:44:58,700 -  INFO - fouram]:  (stats.py:790)
[2024-03-20 17:44:58,702 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c4m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '8',
                                                              'memory': '64Gi'},
                                                   'requests': {'cpu': '8',
                                                                'memory': '32Gi'}},
                                     'replicas': 1},
                       'indexNode': {'resources': {'limits': {'cpu': '4.0',
                                                              'memory': '16Gi'},
                                                   'requests': {'cpu': '3.0',
                                                                'memory': '9Gi'}},
                                     'replicas': 1},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '4Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '3Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240320-7abebf81-amd64'}}},
            'host': 'inverted-corn-dn7xt-4-40-2574-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_varchar_dql_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 3,
                                                    'scalars_index': {'varchar_1': {'index_type': 'INVERTED'},
                                                                      'varchar_2': {'index_type': 'INVERTED'},
                                                                      'varchar_3': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'varchar_1': {'params': {'max_length': 256},
                                                                                     'other_params': {'varchar_filled': True}},
                                                                       'varchar_2': {'params': {'max_length': 32768},
                                                                                     'other_params': {'varchar_filled': True}},
                                                                       'varchar_3': {'params': {'max_length': 65535},
                                                                                     'other_params': {'varchar_filled': True}}},
                                                    'dataset_name': 'local',
                                                    'dataset_size': 300000,
                                                    'ni_per': 50},
                                 'collection_params': {'other_fields': ['varchar_1',
                                                                        'varchar_2',
                                                                        'varchar_3'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 50,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 32},
                                                                  'expr': 'varchar_1 '
                                                                          'like '
                                                                          '"a%" '
                                                                          '&& '
                                                                          'varchar_2 '
                                                                          'like '
                                                                          '"A%" '
                                                                          '&& '
                                                                          'varchar_3 '
                                                                          'like '
                                                                          '"0%" '
                                                                          '&& '
                                                                          'id '
                                                                          '> 0',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'id '
                                                                          '> '
                                                                          '-1 '
                                                                          '&&',
                                                                  'output_fields': ['float_vector'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True,
                                                                  'random_count': 10,
                                                                  'random_range': [0,
                                                                                   150000.0],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024032085773619,
            'datetime': '2024-03-20 09:56:17.332135',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 3893.4345,
                                      'varchar_1': {'RT': 3843.9328},
                                      'varchar_2': {'RT': 2913.0892},
                                      'varchar_3': {'RT': 1893.9922}},
                            'insert': {'total_time': 740.3529,
                                       'VPS': 405.2122,
                                       'batch_time': 0.1234,
                                       'batch': 50},
                            'flush': {'RT': 3.525},
                            'load': {'RT': 78.4088},
                            'Locust': {'Aggregated': {'Requests': 16940,
                                                      'Fails': 0,
                                                      'RPS': 4.71,
                                                      'fail_s': 0.0,
                                                      'RT_max': 38727.92,
                                                      'RT_avg': 10581.97,
                                                      'TP50': 6900.0,
                                                      'TP99': 33000.0},
                                       'query': {'Requests': 8416,
                                                 'Fails': 0,
                                                 'RPS': 2.34,
                                                 'fail_s': 0.0,
                                                 'RT_max': 38727.92,
                                                 'RT_avg': 16985.16,
                                                 'TP50': 15000.0,
                                                 'TP99': 35000.0},
                                       'search': {'Requests': 8524,
                                                  'Fails': 0,
                                                  'RPS': 2.37,
                                                  'fail_s': 0.0,
                                                  'RT_max': 12464.69,
                                                  'RT_avg': 4259.9,
                                                  'TP50': 3700.0,
                                                  'TP99': 7100.0}}}}}

wangting0128 avatar Mar 21 '24 09:03 wangting0128

/unassign

yanliang567 avatar Mar 21 '24 10:03 yanliang567

not compaction happens actually two field(103,104) build inverted index and when build success, they are reload to segment to replace origin data. image first load happens, segment 448509732647565654 not load field(103&& 104) index because not build success as show upper figure. then do search using field(103&&104) with raw data. then when two inverted index build success. load index happens. image as upper figure shows. in 17:07:11, segment reload the new index.

so why latency become big. inverted index may be not suitable for this situation. can modify this test with erasing this index and compare performance between with index and without index.

zhagnlu avatar Mar 22 '24 08:03 zhagnlu

@longjiquan please check this situation whether suitable for inverted index

zhagnlu avatar Mar 22 '24 09:03 zhagnlu

image this is a test segcore latency for normal case that no delayed loading index happen

image this is this issue segcore lantency they are all 1.7s,it shows if using inverted index, need 1.7s. it prove upper conclusion

zhagnlu avatar Mar 22 '24 09:03 zhagnlu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 11 '24 03:06 stale[bot]