faiss
faiss copied to clipboard
Add Search Centroids Utility Method
This addresses a small gap where it is difficult to extract the nearest k centroid labels and distances from an IVF index that has preceding transforms, such as PCA or normalization.
Currently, you can use search_centroid, but that returns only a single centroid per embedding and does not return distances. You can also search directly against the ivf->quantizer, but this skips the transform, so you need to apply the transform manually. The apply_chain method for the IndexPreTransform is available, but because that returns a pointer to a float array, you get a memory leak.
The new search_centroids method is an expanded version of IVFlib search_centroid method, but accepts a k value and also supplies distances in addition to centroid labels. The old cpp search_centroid method now wraps the expanded version.
The search_centroids method is replaced in the Python implementation to make it easier to call. I chose not to replace search_centroid for backward compatibility, because a user may have implemented calling the swig interface directly.
Syntax is: def search_centroids(index, x, k=1, distances=None, labels=None)
If labels and or distances are missing, they are created from the number of embeddings and k.
E.g., D, I = faiss.search_centroids(index, x, 15)
I believe these are the steps for running the clang lint locally, (macOs):
brew install clang-format@11
git ls-files | grep -E '\.(cpp|h|cu|cuh)$' | xargs clang-format-11 -i
@makosten would you mind rebasing this PR if it is possible? Thanks!
Thanks for the contribution but I think it is a trivial addition, so I don't think it brings sufficient value to the library given the number of LOCs of the PR.