vearch 关于分片与副本对性能影响的提问

关于分片与副本对性能影响的提问

Open Alsace-lee opened this issue 3 years ago • 1 comments

最近一直在测试分片与多副本性能，所以常会遇见一些疑问，麻烦各位。

背景：在两台机器上创建了一个2分片，每个分片2个副本的业务并建立了3000w左右数据量的底库 1、在四个PS实例全开的情况下，单线程top500搜索性能(每次随机向量进行查询)一开始会在500ms左右，后续逐渐下降，直至200ms级。性能的下降可能和操作系统缓存和查询缓存有关，但是仍然没有达到预期性能。 2、尝试每个分片关闭一个节点后，性能得到了提升，top500性能在100ms左右。

疑问： 1、在搜索中是否存在“预热索引”这样的说法，我应该如何理解搜索性能在持续的调用中的逐渐上升？ 2、各分片节点数量理论上不应该影响整体搜索性能，但实测下来却又有这样的影响。

最后附建表语句与搜索语句： { "name": "coll1", "partition_num": 2, "replica_num":2, "engine": { "name": "gamma", "index_size": 100000, "id_type": "Long", "retrieval_type": "IVFPQ", "retrieval_param": { "metric_type": "InnerProduct", "ncentroids": 16384, "nsubvector": 32 } }, "properties": { "com_id":{ "type":"string", "index":true }, "main_prod_value":{ "type":"integer", "index":true }, "prod_id": { "type": "string", "index":true }, "membership": { "type": "string", "index": true }, "display_order":{ "type": "integer", "index": true }, "cs_level": { "type": "integer", "index": true }, "cat_code":{ "type":"string", "index":true }, "app_sensitive":{ "type":"integer", "index":true }, "is_ecigar":{ "type":"integer", "index":true }, "main_color_code":{ "type":"integer", "index":true }, "is_trade_prod":{ "type":"integer", "index":true }, "is_video_prod":{ "type":"integer", "index":true }, "star_level":{ "type":"integer", "index":true }, "vector": { "type": "vector", "dimension": 256, "format": "normalization", "store_type": "MemoryOnly", "store_param": { "cache_size": 4096, "compress": { "rate": 16 } } } }

搜索语句： { "query": { "sum": [ { "field": "vector", "feature": [0.038894109427928925.....] } ] }, "retrieval_param": { "nprobe": 32 } }

Jun 19 '21 07:06 Alsace-lee

1.搜索性能在持续的调用中的逐渐上升应该跟缓存有关，内存数据不会立马回收的，缓存的存在会导致性能会变好 2.应该是跟你每台机器上放了两个分片有关，导致搜索的时候一台机器上的两个分片同时在搜索，最好是一个机器节点一个分片，另外最好使用三副本，奇数个节点才能保证选主正常进行

Jun 22 '21 08:06 zcdb

vearch vearch copied to clipboard

关于分片与副本对性能影响的提问

vearch
vearch copied to clipboard