vector-db-benchmark icon indicating copy to clipboard operation
vector-db-benchmark copied to clipboard

qdrant's bencnmark is reporting an extremely high latencies for on-disk index qith 140M vectors

Open igmor opened this issue 8 months ago • 0 comments

Testing qdrant on 8 cores, 64GB of memory r6i.2xlalrge instance

Here is the collection's configuration:

{
  "params":{
    "vectors":{
      "size":96
      "distance":"Euclid"
     }
   "shard_number":1
   "replication_factor":1
   "write_consistency_factor":1
   "on_disk_payload":true
  }
"hnsw_config":{
    "m":16
    "ef_construct":128
    "full_scan_threshold":10000
    "max_indexing_threads":0
    "on_disk":true
}
"optimizer_config":{
    "deleted_threshold":0.2
    "vacuum_min_vector_number":1000
    "default_segment_number":0
    "max_segment_size": NULL
    "memmap_threshold": NULL
    "indexing_threshold":20000
    "flush_interval_sec":5
    "max_optimization_threads":0
}
"wal_config":{
    "wal_capacity_mb":32
    "wal_segments_ahead":0
}
"quantization_config": NULL
}

Current Behavior

I downloaded and inserted about 140M vectors from Yandex https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search. Upon testing I'm getting about 1.6 - 1.7 vector lookups per second, so overall 580-625ms search latency per query lookup. We are specifically trying to observe mem-mapped file performance in this case. Would you advise on anything in a configuration that would help us to optimize index performance and get better results?

Steps to Reproduce

See above

Expected Behavior

Expecting to see decent latencies for vector lookups.

igmor avatar Jun 10 '24 20:06 igmor