similarity icon indicating copy to clipboard operation
similarity copied to clipboard

expose nmslib HNSW method parameters for recall fine-tuning

Open labroskokkalas opened this issue 3 years ago • 1 comments

I have build a hnsw index ( cosine distance ) with about 6M points. Using the same 6M points as test dataset, the 1st neighbor recall is about 0.979 which is high but not sufficient for my application. Setting M,post,efConstruction and efSearch parameters (https://github.com/nmslib/nmslib/blob/master/manual/methods.md) in tensorflow_similarity/search/nmslib_search.py

self._search_index.setQueryTimeParams({'efSearch': 2000}) self._search_index.createIndex({'post': 2,'efConstruction':'2000','M':64},print_progress=show)

results to a 1st neighbor recall of 0.999. These parameters provide a trade off between recall and indexing_time/memory_usage and would be very useful in production systems for recall fine-tuning.

labroskokkalas avatar Apr 17 '22 18:04 labroskokkalas

Thanks for sharing the results. We're looking at making the storage and indexing layer more flexible and I'll be sure to include this in the changes.

owenvallis avatar May 14 '22 00:05 owenvallis