hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

Distributed build of hnsw and merge of hnsw graphs

Open patelprateek opened this issue 3 years ago • 2 comments

@yurymalkov : I have multiple indexers producing many sharded hnsw indexes , but servers perhaps can handle multiple such shards . I read in the paper that a distributed implementation is feasible for hnsw graphs , can you give me any pointers on that ? Does this also imply it would be easy to merge 2 hnsw graphs ?
A related paper : https://arxiv.org/pdf/1906.10602

patelprateek avatar Jul 22 '21 19:07 patelprateek

@patelprateek I am not sure what is the question here. https://arxiv.org/pdf/1906.10602 seems like a reasonable start. Initial hnsw paper also has discussion of distributed indices without sharding, but it would be hard to implement.

yurymalkov avatar Aug 02 '21 03:08 yurymalkov

sorry for being unclear . My question was regarding

  • distributed build of indices : since indexing takes quite long for 100M docs or cases where we have streaming elements coming in quite frequently , was curious to any open source implementations where we can build the graph in a distributed way and later combine them into a single hnsw graph , or even a distributed graph but route serving request to the appropiate worker based on the graph structure. a naive sharding approach can cause large fanouts since the nearest neighbors can be in any of the shards

patelprateek avatar Aug 02 '21 07:08 patelprateek