hnswlib
hnswlib copied to clipboard
Distributed build of hnsw and merge of hnsw graphs
@yurymalkov : I have multiple indexers producing many sharded hnsw indexes , but servers perhaps can handle multiple such shards . I read in the paper that a distributed implementation is feasible for hnsw graphs , can you give me any pointers on that ? Does this also imply it would be easy to merge 2 hnsw graphs ?
A related paper : https://arxiv.org/pdf/1906.10602
@patelprateek I am not sure what is the question here. https://arxiv.org/pdf/1906.10602 seems like a reasonable start. Initial hnsw paper also has discussion of distributed indices without sharding, but it would be hard to implement.
sorry for being unclear . My question was regarding
- distributed build of indices : since indexing takes quite long for 100M docs or cases where we have streaming elements coming in quite frequently , was curious to any open source implementations where we can build the graph in a distributed way and later combine them into a single hnsw graph , or even a distributed graph but route serving request to the appropiate worker based on the graph structure. a naive sharding approach can cause large fanouts since the nearest neighbors can be in any of the shards