Leonid Boytsov
Leonid Boytsov
@OriKatz I have actually had my own experiments (on very different data) with the same reduction and it didn't work out well for me either. Bugs are always possible though.
@yurymalkov what if you have a lot of such points:
@OriKatz I would think that graph-based retrieval is a very stable approach in general. It often works very well on weird datasets. However, for the popular-element problem there might be...
@yurymalkov I don't suggest placing popular elements in top layers :-)
@OriKatz I've some success building a graph using a slightly different metric than the original one. I used it in my thesis and in a follow-up publication. If the indexing...
PS: and there's always of course an option to index popular items separately in a second index.
@yurymalkov ML fairness people would eat you alive 😃
Hi @jianshu93 it's not clear what you mean by the distributed computation. In the most common scenario, which is called sharding, the database is split into K chunks that queried...
@jianshu93 if the search is **perfect** than getting top-k results from each of the K-shards **provably** retrieves top-k of the complete collection. The reason is simple: imagine some number k1
@h-shahidi you are asking about nmslib, not hnswlib. First of all, using random embeddings of dim 100 for benchmarking is a very bad idea, because they won't be searched very...