langchain
langchain copied to clipboard
Fixed a similarity score calculation bug in Chroma module
The _results_to_docs_and_scores()
function should have returned the similarity score (higher values indicating more similarity), but erroneously returned the distance (higher values indicating less similarity).
@rlancemartin, @eyurtsev
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Jun 30, 2023 9:35am |
I think most of the vector stores "scores" by default are distances, though it sometimes depends on how you configure the underlying index. This is why the relevance scores method was created, as an api that enforces an interpretation of what score means
I agree with you that relevance scores method was created to enforce an interpretation of what score means.
At the same time, I also think that since VectorStore.similarity_search_with_relevance_scores()
treats score uniformly as similarity, the underlying implementation (Chroma in this case) should also return similarity instead of distance.
https://github.com/hwchase17/langchain/blob/64039b9f112653d686bbcdd980fa90ad2f3eb9fc/langchain/vectorstores/base.py#L168-L174
I think most of the vector stores "scores" by default are distances, though it sometimes depends on how you configure the underlying index. This is why the relevance scores method was created, as an api that enforces an interpretation of what score means
I agree with you that relevance scores method was created to enforce an interpretation of what score means.
At the same time, I also think that since
VectorStore.similarity_search_with_relevance_scores()
treats score uniformly as similarity, the underlying implementation (Chroma in this case) should also return similarity instead of distance.https://github.com/hwchase17/langchain/blob/64039b9f112653d686bbcdd980fa90ad2f3eb9fc/langchain/vectorstores/base.py#L168-L174
I think most of the vector stores "scores" by default are distances, though it sometimes depends on how you configure the underlying index. This is why the relevance scores method was created, as an api that enforces an interpretation of what score means
Yes, https://github.com/hwchase17/langchain/pull/6570 has a fix for this.
@boxcounter please confirm that https://github.com/hwchase17/langchain/pull/6570 implements what you need. Will work with @raymond-yuan to get this in.
@boxcounter please confirm that #6570 implements what you need. Will work with @raymond-yuan to get this in.
Great, that's what I need. And it's a much more complete implementation. Nice work!
Fixed in v0.0.230. Close this PR.