langchain
langchain copied to clipboard
Add more index methods to faiss.
Feature request
At the moment faiss is hard wired to IndexFlatL2
.
See here:
https://github.com/hwchase17/langchain/blob/423f497168e3a8982a4cdc4155b15fbfaa089b38/langchain/vectorstores/faiss.py#L347
I would like to set other index methods. For example IndexFlatIP
. This should be configurable.
Also see more index methods here: https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
Motivation
If I have dot product as the distance for my embedding I must change this...
Your contribution
I can provide a PR if wanted.
Can you use:
from langchain import FAISS
index = fais.IndexFlatIP()
faiss = FAISS(embedding_function, index, docstore, index_to_docstore_id)
Then use the add_texts
and add_embeddings
method.
Yep, it is a pitty that the the FAISS LangChain utility for creating vectorestores is hardcoded to use L2 indexes... Especially considering how popular is FAISS as an open-source vectorstore and how relevant the inner product / cosine similarity is for text similarity (used by Azure OpenAI: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/understand-embeddings).
At least, cosine similarity (i.e. IndexFlatIP
with the already inplace normalize_L2
flag set to True) would be a great addition to the .from_text()
or .from_documents()
wrappers, imho...
In addition to vanilla cosine similarity I would also propose sliding window maximum cosine similarity as outlined in Section 3.2.1 of Sentence Similarity Techniques for Short vs Variable Length Text using Word Embeddings--I've found it to be empirically useful for retrieval when the prompt is very short but the relevant document is much longer. Not sure if this can be fairly easily implemented within the existing langchain framework, or if it can only be done in faiss.
Hi, @PhilipMay! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you requested to add more index methods to faiss, specifically the ability to set other index methods such as IndexFlatIP
. There have been some suggestions in the comments, such as using the FAISS
utility from LangChain to achieve this. Additionally, there was a suggestion from afdezt to add cosine similarity and include IndexFlatIP
with the normalize_L2
flag set to True. AlexHuang2 also proposed adding sliding window maximum cosine similarity.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!