milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster] Replace compacted segment has been unsuccessful
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.4-20240320-7abebf81-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: inverted-corn-dn7xt test case name: test_inverted_locust_varchar_dql_cluster
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-dn7xt-4-40-2574-etcd-0 1/1 Running 0 7h48m 10.104.34.52 4am-node37 <none> <none>
inverted-corn-dn7xt-4-40-2574-etcd-1 1/1 Running 0 7h48m 10.104.32.68 4am-node39 <none> <none>
inverted-corn-dn7xt-4-40-2574-etcd-2 1/1 Running 0 7h48m 10.104.33.253 4am-node36 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-datacoord-77c799c69c-p8dxt 1/1 Running 0 7h48m 10.104.19.72 4am-node28 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-datanode-7888f649fc-mjxs4 1/1 Running 1 (7h44m ago) 7h48m 10.104.19.70 4am-node28 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-indexcoord-79694548-pljnm 1/1 Running 0 7h48m 10.104.18.159 4am-node25 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-indexnode-6c79487db7-rwr7v 1/1 Running 0 7h48m 10.104.34.30 4am-node37 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-proxy-658bfd6f4b-wzvwm 1/1 Running 1 (7h44m ago) 7h48m 10.104.19.71 4am-node28 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-querycoord-5dcf95d959m7mkc 1/1 Running 1 (7h44m ago) 7h48m 10.104.18.161 4am-node25 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-querynode-64c859bfb8-q645n 1/1 Running 0 7h48m 10.104.27.152 4am-node31 <none> <none>
inverted-corn-dn7xt-4-40-2574-milvus-rootcoord-7bdb59f749-ldj7h 1/1 Running 1 (7h44m ago) 7h48m 10.104.18.160 4am-node25 <none> <none>
inverted-corn-dn7xt-4-40-2574-minio-0 1/1 Running 0 7h48m 10.104.34.51 4am-node37 <none> <none>
inverted-corn-dn7xt-4-40-2574-minio-1 1/1 Running 0 7h48m 10.104.33.248 4am-node36 <none> <none>
inverted-corn-dn7xt-4-40-2574-minio-2 1/1 Running 0 7h48m 10.104.32.73 4am-node39 <none> <none>
inverted-corn-dn7xt-4-40-2574-minio-3 1/1 Running 0 7h48m 10.104.21.28 4am-node24 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-0 1/1 Running 0 7h48m 10.104.21.27 4am-node24 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-1 1/1 Running 0 7h48m 10.104.33.254 4am-node36 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-2 1/1 Running 0 7h48m 10.104.32.74 4am-node39 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-bookie-init-24fz6 0/1 Completed 0 7h48m 10.104.30.115 4am-node38 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-broker-0 1/1 Running 0 7h48m 10.104.30.117 4am-node38 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-proxy-0 1/1 Running 0 7h48m 10.104.1.243 4am-node10 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-pulsar-init-856ld 0/1 Completed 0 7h48m 10.104.30.114 4am-node38 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-recovery-0 1/1 Running 0 7h48m 10.104.32.59 4am-node39 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-0 1/1 Running 0 7h48m 10.104.33.252 4am-node36 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-1 1/1 Running 0 7h47m 10.104.30.134 4am-node38 <none> <none>
inverted-corn-dn7xt-4-40-2574-pulsar-zookeeper-2 1/1 Running 0 7h46m 10.104.32.85 4am-node39 <none> <none>
After creating the index, the new segment cannot be loaded on the queryNode,queryNode memory has 64G
client pod name: inverted-corn-dn7xt-291792770 client logs:
Expected Behavior
No response
Steps To Reproduce
concurrent test and calculation of RT and QPS
:purpose: `varchar: different max_length`
verify concurrent DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index
:test steps:
1. create collection with fields:
'float_vector': 3dim,
'varchar_1': max_length=256, varchar_filled=True
'varchar_2': max_length=32768, varchar_filled=True
'varchar_3': max_length=65535, varchar_filled=True
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'varchar_1', 'varchar_2', 'varchar_3'
3. insert 300k data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- search
- query
Milvus Log
No response
Anything else?
test result:
[2024-03-20 17:44:58,699 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-20 17:44:58,700 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: grpc query 8416 0(0.00%) | 16985 2463 38727 15000 | 2.34 0.00 (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: grpc search 8524 0(0.00%) | 4259 2157 12464 3700 | 2.37 0.00 (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: Aggregated 16940 0(0.00%) | 10581 2157 38727 6900 | 4.71 0.00 (stats.py:789)
[2024-03-20 17:44:58,700 - INFO - fouram]: (stats.py:790)
[2024-03-20 17:44:58,702 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c4m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '8',
'memory': '64Gi'},
'requests': {'cpu': '8',
'memory': '32Gi'}},
'replicas': 1},
'indexNode': {'resources': {'limits': {'cpu': '4.0',
'memory': '16Gi'},
'requests': {'cpu': '3.0',
'memory': '9Gi'}},
'replicas': 1},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '4Gi'},
'requests': {'cpu': '2.0',
'memory': '3Gi'}}},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240320-7abebf81-amd64'}}},
'host': 'inverted-corn-dn7xt-4-40-2574-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_varchar_dql_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 3,
'scalars_index': {'varchar_1': {'index_type': 'INVERTED'},
'varchar_2': {'index_type': 'INVERTED'},
'varchar_3': {'index_type': 'INVERTED'}},
'scalars_params': {'varchar_1': {'params': {'max_length': 256},
'other_params': {'varchar_filled': True}},
'varchar_2': {'params': {'max_length': 32768},
'other_params': {'varchar_filled': True}},
'varchar_3': {'params': {'max_length': 65535},
'other_params': {'varchar_filled': True}}},
'dataset_name': 'local',
'dataset_size': 300000,
'ni_per': 50},
'collection_params': {'other_fields': ['varchar_1',
'varchar_2',
'varchar_3'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 50,
'during_time': '1h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 1,
'params': {'nq': 1000,
'top_k': 10,
'search_param': {'nprobe': 32},
'expr': 'varchar_1 '
'like '
'"a%" '
'&& '
'varchar_2 '
'like '
'"A%" '
'&& '
'varchar_3 '
'like '
'"0%" '
'&& '
'id '
'> 0',
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 1,
'params': {'ids': None,
'expr': 'id '
'> '
'-1 '
'&&',
'output_fields': ['float_vector'],
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 60,
'random_data': True,
'random_count': 10,
'random_range': [0,
150000.0],
'field_name': 'id',
'field_type': 'int64'}}]},
'run_id': 2024032085773619,
'datetime': '2024-03-20 09:56:17.332135',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 3893.4345,
'varchar_1': {'RT': 3843.9328},
'varchar_2': {'RT': 2913.0892},
'varchar_3': {'RT': 1893.9922}},
'insert': {'total_time': 740.3529,
'VPS': 405.2122,
'batch_time': 0.1234,
'batch': 50},
'flush': {'RT': 3.525},
'load': {'RT': 78.4088},
'Locust': {'Aggregated': {'Requests': 16940,
'Fails': 0,
'RPS': 4.71,
'fail_s': 0.0,
'RT_max': 38727.92,
'RT_avg': 10581.97,
'TP50': 6900.0,
'TP99': 33000.0},
'query': {'Requests': 8416,
'Fails': 0,
'RPS': 2.34,
'fail_s': 0.0,
'RT_max': 38727.92,
'RT_avg': 16985.16,
'TP50': 15000.0,
'TP99': 35000.0},
'search': {'Requests': 8524,
'Fails': 0,
'RPS': 2.37,
'fail_s': 0.0,
'RT_max': 12464.69,
'RT_avg': 4259.9,
'TP50': 3700.0,
'TP99': 7100.0}}}}}
/unassign
not compaction happens actually two field(103,104) build inverted index and when build success, they are reload to segment to replace origin data.
first load happens, segment 448509732647565654 not load field(103&& 104) index because not build success as show upper figure.
then do search using field(103&&104) with raw data.
then when two inverted index build success. load index happens.
as upper figure shows. in 17:07:11, segment reload the new index.
so why latency become big. inverted index may be not suitable for this situation. can modify this test with erasing this index and compare performance between with index and without index.
@longjiquan please check this situation whether suitable for inverted index
this is a test segcore latency for normal case that no delayed loading index happen
this is this issue segcore lantency
they are all 1.7s,it shows if using inverted index, need 1.7s. it prove upper conclusion
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.