hnswlib Applicability of HNSW for maximum inner product search

When specifying the option space='ip', the same algorithm that is being used for 'l2' distance is used? Or that there is a reduction to a metric space and then HNSW is applied?

Thanks in advance, Ori

May 18 '20 06:05 OriKatz

No reduction is necessary. In fact, it can be detrimental: Morozov, Stanislav, and Artem Babenko. "Non-metric similarity graphs for maximum inner product search." Advances in Neural Information Processing Systems. 2018.

May 18 '20 06:05 searchivarius

Thank you for the quick response! I am aware of this paper and actually I asked this question because I saw that Morozov et al. compared their algorithm with a baseline called "NSW+reduction". From your experience, this reduction is redundant\inadequate, or that it might yield better performances? Thanks again.

May 18 '20 10:05 OriKatz

@OriKatz I have actually had my own experiments (on very different data) with the same reduction and it didn't work out well for me either. Bugs are always possible though.

May 18 '20 13:05 searchivarius

@OriKatz Note that it is possible to improve the speed of inner product search by removing from the dataset the elements which are not the closest to themselves.

May 18 '20 17:05 yurymalkov

@yurymalkov what if you have a lot of such points:

May 18 '20 18:05 searchivarius

@searchivarius I think that means big speedup. Those elements are useless.

May 18 '20 18:05 yurymalkov

Thanks a lot for the comments(:

@yurymalkov Regarding your suggestion - the dataset I'm considering is originated from a collaborative filtering model (users and items embeddings), and some of the elements may be highly popular (popular items). Therefore I'm concerned that most of the elements are not most similar to themselves. Had you ever tried applying HNSW equipped with 'inner-product similarity for this kind of data?

May 18 '20 19:05 OriKatz

@OriKatz I would think that graph-based retrieval is a very stable approach in general. It often works very well on weird datasets. However, for the popular-element problem there might be another speed-up solution though it requires some changing of the code.

May 18 '20 19:05 searchivarius

@OriKatz I've applied it to a simulated data. It worked reasonably good, although the result depended on the order of data. @searchivarius I think the best solution is just to remove useless elements. If there is good query source, it can be used to additionally remove some elements. I think placing popular elements at top layers was discussed during the work on the neurips paper, but probably it didn't help that much.

May 18 '20 19:05 yurymalkov

@yurymalkov I don't suggest placing popular elements in top layers :-)

May 18 '20 19:05 searchivarius

@yurymalkov What do you mean by "depend on the order"? As far as I get popular items may cause hubs which can yield problems when navigating, therefore I assume that inserting the elements according to the inverse of the popularity is better. On the other hand, non-popular items might exhibit very noisy and unstructured embeddings.. so I'm not sure this is a good methodology for handling this kind of problem. Do you agree with these conjectures? or that my intuition\understanding is wrong?

@searchivarius What kind of code modification do you think that can aid when facing diversity in the popularity of certain elements?

May 18 '20 20:05 OriKatz

@OriKatz I've some success building a graph using a slightly different metric than the original one. I used it in my thesis and in a follow-up publication. If the indexing metric is cosine similarity or some linear combination of cosine and inner product this may mitigate the problem. Whether it helps or not is an empirical question though. However, you need to use this combination at index time. Retrieval likely needs to be guided by the regular inner product.

May 18 '20 20:05 searchivarius

PS: and there's always of course an option to index popular items separately in a second index.

May 18 '20 20:05 searchivarius

@OriKatz In HNSW the out degree on each level is fixed, so the search does not see any hubs. There are in-bound hubs, leading to harder exploration of non-popular items, but I am not sure if it is a problem, as non-popular items might never end up as an answer even in the exact search. The worst thing that can happen in general is clusters, which require special parameters in HNSW, but this is orthogonal to inner product. @searchivarius Ok. Yes, I remember that :)

May 18 '20 23:05 yurymalkov

@yurymalkov ML fairness people would eat you alive 😃

May 19 '20 02:05 searchivarius

hnswlib hnswlib copied to clipboard

Applicability of HNSW for maximum inner product search

hnswlib
hnswlib copied to clipboard