vector-db-benchmark
vector-db-benchmark copied to clipboard
qdrant's bencnmark is reporting an extremely high latencies for on-disk index qith 140M vectors
Testing qdrant on 8 cores, 64GB of memory r6i.2xlalrge instance
Here is the collection's configuration:
{
"params":{
"vectors":{
"size":96
"distance":"Euclid"
}
"shard_number":1
"replication_factor":1
"write_consistency_factor":1
"on_disk_payload":true
}
"hnsw_config":{
"m":16
"ef_construct":128
"full_scan_threshold":10000
"max_indexing_threads":0
"on_disk":true
}
"optimizer_config":{
"deleted_threshold":0.2
"vacuum_min_vector_number":1000
"default_segment_number":0
"max_segment_size": NULL
"memmap_threshold": NULL
"indexing_threshold":20000
"flush_interval_sec":5
"max_optimization_threads":0
}
"wal_config":{
"wal_capacity_mb":32
"wal_segments_ahead":0
}
"quantization_config": NULL
}
Current Behavior
I downloaded and inserted about 140M vectors from Yandex https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search. Upon testing I'm getting about 1.6 - 1.7 vector lookups per second, so overall 580-625ms search latency per query lookup. We are specifically trying to observe mem-mapped file performance in this case. Would you advise on anything in a configuration that would help us to optimize index performance and get better results?
Steps to Reproduce
See above
Expected Behavior
Expecting to see decent latencies for vector lookups.