langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Refactor vector storage to correctly handle relevancy scores

Open raymond-yuan opened this issue 2 years ago • 8 comments

Description: This pull request aims to support generating the correct generic relevancy scores for different vector stores by refactoring the relevance score functions and their selection in the base class and subclasses of VectorStore. This is especially relevant with VectorStores that require a distance metric upon initialization. Note many of the current implenetations of _similarity_search_with_relevance_scores are not technically correct, as they just return self.similarity_search_with_score(query, k, **kwargs) without applying the relevant score function

Also includes changes associated with: https://github.com/hwchase17/langchain/pull/6564 and https://github.com/hwchase17/langchain/pull/6494

See more indepth discussion in thread in #6494

Issue: https://github.com/hwchase17/langchain/issues/6526 https://github.com/hwchase17/langchain/issues/6481 https://github.com/hwchase17/langchain/issues/6346

Dependencies: None

The changes include:

  • Properly handling score thresholding in FAISS similarity_search_with_score_by_vector for the corresponding distance metric.
  • Refactoring the _similarity_search_with_relevance_scores method in the base class and removing it from the subclasses for incorrectly implemented subclasses.
  • Adding a _select_relevance_score_fn method in the base class and implementing it in the subclasses to select the appropriate relevance score function based on the distance strategy.
  • Updating the __init__ methods of the subclasses to set the relevance_score_fn attribute.
  • Removing the _default_relevance_score_fn function from the FAISS class and using the base class's _euclidean_relevance_score_fn instead.
  • Adding the DistanceStrategy enum to the utils.py file and updating the imports in the vector store classes.
  • Updating the tests to import the DistanceStrategy enum from the utils.py file.

raymond-yuan avatar Jun 22 '23 00:06 raymond-yuan

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jul 10, 2023 7:58pm

vercel[bot] avatar Jun 22 '23 00:06 vercel[bot]

@dev2049

raymond-yuan avatar Jun 22 '23 00:06 raymond-yuan

Any plan to rebase this patch? I have tested it and it was working fine.

flepied avatar Jul 02 '23 13:07 flepied

Nice work @raymond-yuan. Can you merge master to resolve conflicts and have a look at any Lint errors (I just kicked off tests again)?

rlancemartin avatar Jul 02 '23 14:07 rlancemartin

I have merged master at https://github.com/flepied/langchain/tree/relevance_refactor and it seems to work well with my tests...

flepied avatar Jul 02 '23 17:07 flepied

I have merged master at https://github.com/flepied/langchain/tree/relevance_refactor and it seems to work well with my tests...

Great. When @raymond-yuan is back online, I'll let him resolve the merge conflicts.

I had a quick look, but there were a few to work though, so I wanted to make sure they are resolved correctly.

rlancemartin avatar Jul 02 '23 20:07 rlancemartin

I'll have a look today and aim to get this in.

rlancemartin avatar Jul 10 '23 13:07 rlancemartin

@raymond-yuan I see tests for chroma, faiss, pgvector, singlestore. but a larger set of DBs (pinecone, etc) are modified; tests for some like pinecone are tricky b/c they are not in memory, but do we have confirmation (e.g., from user testing) that this is a no-op / backwards compatible across the broader set of DBs? i see the the existing logic is moved to _select_relevance_score_fn, so it should be, but would be nice to verify it we could (e.g., a subtle change in scoring may confuse folks).

rlancemartin avatar Jul 10 '23 14:07 rlancemartin

@raymond-yuan I see tests for chroma, faiss, pgvector, singlestore. but a larger set of DBs (pinecone, etc) are modified; tests for some like pinecone are tricky b/c they are not in memory, but do we have confirmation (e.g., from user testing) that this is a no-op / backwards compatible across the broader set of DBs? i see the the existing logic is moved to _select_relevance_score_fn, so it should be, but would be nice to verify it we could (e.g., a subtle change in scoring may confuse folks).

@rlancemartin added some additional tests (using personal api keys), for additional dbs. To be clear, for vectorstores that already had the relevancy score function already implemented, these should be no-ops and completely backwards compatible. However for the majority of classes that did not have these implemented, the _similarity_search_with_relevance_scores will produce different results, whch is the desired outcome since these were incorrectly implemented before (since the old behavior was just doing a similarity search and returning the corresponding distance).

raymond-yuan avatar Jul 10 '23 20:07 raymond-yuan

relevancy score function already implemented, these should be no-ops and completely backwards compatible

Ya, agreed. Cool, thanks for running some additional tests.

Going to merge this now, esp given the demand to get it in.

We can take up follow-up work on this in newer PRs.

Good work!

rlancemartin avatar Jul 11 '23 03:07 rlancemartin

@raymond-yuan Looking into this for the supabase retriever. Awesome work! Any reason why you left supabase out of this? Was it implemented correctly?

ShantanuNair avatar Jul 11 '23 13:07 ShantanuNair

@ShantanuNair ah sorry, no I missed this because it already had a custom implementation of similarity_search_with_relevance_score. Upon closer inspection, I believe this is correctly implemented for the default match_document being used documentation. This code already converts the default distance fn (cosine distance) into a similarity score (1 - cos(x))

raymond-yuan avatar Jul 11 '23 19:07 raymond-yuan

Looks like relevant issue with Pinecone Cosine Similarity, which is converted back to distance - filtering out relevant documents when using score_threshold -- #8207

olegshirokikh avatar Jul 24 '23 23:07 olegshirokikh

I found the score_threshold also filters out the most relevant doc for the FAISS vectorstore. I think there are still bugs with this refactoring changes. `

First, the FAISS.similarity_search_with_score /similarity_search_with_score_by_vector actually returns the score which already represents the similarity score instead of distance... (I validated with the embed_function with the same sentences and the dot product is 1 which matches the score from the function) but somehow the docstring says the lower score means more similarity?

Second, the _max_inner_product_relevance_score_fn should just return the score itself for FAISS (at least) if the higher score from FAISS.similarity_search_with_score /similarity_search_with_score_by_vector represents the higher similarity... let me know if I miss anything.

I guess a fix is to implement the _max_inner_product_relevance_score_fn in the FAISS vectorstore file, like lambda x: x.

jpzhangvincent avatar Jul 25 '23 00:07 jpzhangvincent