PolyFuzz icon indicating copy to clipboard operation
PolyFuzz copied to clipboard

Faiss

Open MaartenGr opened this issue 3 years ago • 2 comments

Faiss allows you to efficiently search and cluster dense vectors. This could be beneficial when comparing the cosine similarities between vectors in the TF-IDF and Embeddings model.

However, since it is a conda-only install and not pip-based and there is currently no Windows version (aside from the nightly version) it is best to postpone this until the windows version is stable. This would also require the user to install Faiss before installing PolyFuzz which does not help the user experience.

MaartenGr avatar Nov 27 '20 14:11 MaartenGr

Thread bump? I'm currently running into some scaling issues with Polyfuzz TFIDF and Faiss is solving the problem for me. When I get some spare cycles I'd be happy to push up a PR to incorporate if that's interesting to you? Seems faiss installs way easier these days....

DGaffney avatar Aug 30 '23 16:08 DGaffney

If I am not mistaken, it still requires conda to install it right? That would not allow it to install it through the pip method that is currently used in this repo.

Having said that, it could be used as an additional method of quickly finding the right vectors. So, I am all for it as long as we can keep it a relatively minimal approach 😄

MaartenGr avatar Sep 03 '23 07:09 MaartenGr