PolyFuzz
PolyFuzz copied to clipboard
Faiss
Faiss allows you to efficiently search and cluster dense vectors. This could be beneficial when comparing the cosine similarities between vectors in the TF-IDF and Embeddings model.
However, since it is a conda-only install and not pip-based and there is currently no Windows version (aside from the nightly version) it is best to postpone this until the windows version is stable. This would also require the user to install Faiss before installing PolyFuzz which does not help the user experience.
Thread bump? I'm currently running into some scaling issues with Polyfuzz TFIDF and Faiss is solving the problem for me. When I get some spare cycles I'd be happy to push up a PR to incorporate if that's interesting to you? Seems faiss installs way easier these days....
If I am not mistaken, it still requires conda to install it right? That would not allow it to install it through the pip
method that is currently used in this repo.
Having said that, it could be used as an additional method of quickly finding the right vectors. So, I am all for it as long as we can keep it a relatively minimal approach 😄