langchain
langchain copied to clipboard
Refactor vector storage to correctly handle relevancy scores
Description: This pull request aims to support generating the correct generic relevancy scores for different vector stores by refactoring the relevance score functions and their selection in the base class and subclasses of VectorStore. This is especially relevant with VectorStores that require a distance metric upon initialization. Note many of the current implenetations of _similarity_search_with_relevance_scores are not technically correct, as they just return self.similarity_search_with_score(query, k, **kwargs) without applying the relevant score function
Also includes changes associated with: https://github.com/hwchase17/langchain/pull/6564 and https://github.com/hwchase17/langchain/pull/6494
See more indepth discussion in thread in #6494
Issue: https://github.com/hwchase17/langchain/issues/6526 https://github.com/hwchase17/langchain/issues/6481 https://github.com/hwchase17/langchain/issues/6346
Dependencies: None
The changes include:
- Properly handling score thresholding in FAISS
similarity_search_with_score_by_vectorfor the corresponding distance metric. - Refactoring the
_similarity_search_with_relevance_scoresmethod in the base class and removing it from the subclasses for incorrectly implemented subclasses. - Adding a
_select_relevance_score_fnmethod in the base class and implementing it in the subclasses to select the appropriate relevance score function based on the distance strategy. - Updating the
__init__methods of the subclasses to set therelevance_score_fnattribute. - Removing the
_default_relevance_score_fnfunction from the FAISS class and using the base class's_euclidean_relevance_score_fninstead. - Adding the
DistanceStrategyenum to theutils.pyfile and updating the imports in the vector store classes. - Updating the tests to import the
DistanceStrategyenum from theutils.pyfile.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| langchain | ⬜️ Ignored (Inspect) | Jul 10, 2023 7:58pm |
@dev2049
Any plan to rebase this patch? I have tested it and it was working fine.
Nice work @raymond-yuan. Can you merge master to resolve conflicts and have a look at any Lint errors (I just kicked off tests again)?
I have merged master at https://github.com/flepied/langchain/tree/relevance_refactor and it seems to work well with my tests...
I have merged master at https://github.com/flepied/langchain/tree/relevance_refactor and it seems to work well with my tests...
Great. When @raymond-yuan is back online, I'll let him resolve the merge conflicts.
I had a quick look, but there were a few to work though, so I wanted to make sure they are resolved correctly.
I'll have a look today and aim to get this in.
@raymond-yuan I see tests for chroma, faiss, pgvector, singlestore. but a larger set of DBs (pinecone, etc) are modified; tests for some like pinecone are tricky b/c they are not in memory, but do we have confirmation (e.g., from user testing) that this is a no-op / backwards compatible across the broader set of DBs? i see the the existing logic is moved to _select_relevance_score_fn, so it should be, but would be nice to verify it we could (e.g., a subtle change in scoring may confuse folks).
@raymond-yuan I see tests for chroma, faiss, pgvector, singlestore. but a larger set of DBs (pinecone, etc) are modified; tests for some like pinecone are tricky b/c they are not in memory, but do we have confirmation (e.g., from user testing) that this is a no-op / backwards compatible across the broader set of DBs? i see the the existing logic is moved to
_select_relevance_score_fn, so it should be, but would be nice to verify it we could (e.g., a subtle change in scoring may confuse folks).
@rlancemartin added some additional tests (using personal api keys), for additional dbs. To be clear, for vectorstores that already had the relevancy score function already implemented, these should be no-ops and completely backwards compatible. However for the majority of classes that did not have these implemented, the _similarity_search_with_relevance_scores will produce different results, whch is the desired outcome since these were incorrectly implemented before (since the old behavior was just doing a similarity search and returning the corresponding distance).
relevancy score function already implemented, these should be no-ops and completely backwards compatible
Ya, agreed. Cool, thanks for running some additional tests.
Going to merge this now, esp given the demand to get it in.
We can take up follow-up work on this in newer PRs.
Good work!
@raymond-yuan Looking into this for the supabase retriever. Awesome work! Any reason why you left supabase out of this? Was it implemented correctly?
@ShantanuNair ah sorry, no I missed this because it already had a custom implementation of similarity_search_with_relevance_score. Upon closer inspection, I believe this is correctly implemented for the default match_document being used documentation. This code already converts the default distance fn (cosine distance) into a similarity score (1 - cos(x))
Looks like relevant issue with Pinecone Cosine Similarity, which is converted back to distance - filtering out relevant documents when using score_threshold -- #8207
I found the score_threshold also filters out the most relevant doc for the FAISS vectorstore. I think there are still bugs with this refactoring changes. `
First, the FAISS.similarity_search_with_score /similarity_search_with_score_by_vector actually returns the score which already represents the similarity score instead of distance... (I validated with the embed_function with the same sentences and the dot product is 1 which matches the score from the function) but somehow the docstring says the lower score means more similarity?
Second, the _max_inner_product_relevance_score_fn should just return the score itself for FAISS (at least) if the higher score from FAISS.similarity_search_with_score /similarity_search_with_score_by_vector represents the higher similarity... let me know if I miss anything.
I guess a fix is to implement the _max_inner_product_relevance_score_fn in the FAISS vectorstore file, like lambda x: x.