langchain icon indicating copy to clipboard operation
langchain copied to clipboard

ChromaDB score goes the wrong way

Open NicoWeio opened this issue 2 years ago • 1 comments

System Info

langchain==0.0.227 langchainplus-sdk==0.0.20 chromadb==0.3.26

Who can help?

No response

Information

  • [ ] The official example notebooks/scripts
  • [X] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [X] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

I'm sorry, but I don't have the time to carve out a MRE right now. My take is that it's still better to report it than not to.

Expected behavior

similarity_search_with_score returns the distance as expected, but similarity_search_with_relevance_scores gives the same values, so that the closest distances return the smallest values, even though the output of the latter function is supposed to be higher for vectors that are closer:

similarity_search_with_relevance_scores Return docs and relevance scores in the range [0, 1]. 0 is dissimilar, 1 is most similar.

NicoWeio avatar Jul 07 '23 23:07 NicoWeio

I just noticed the same while trying to figure out https://github.com/hwchase17/langchain/issues/7427, the code should do a 1 - score

l0rinc avatar Jul 09 '23 11:07 l0rinc

It seem to be the same with redis. Though I'm confused if this is because of this?

DavidArenburg avatar Jul 18 '23 13:07 DavidArenburg

Hi, @NicoWeio. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported is related to the similarity_search_with_relevance_scores function in ChromaDB. It seems that the function is returning incorrect values, with smaller values being returned for closer vectors instead of higher values. User paplorinc has suggested a potential fix by using 1 - score in the code. Additionally, another user named DavidArenburg has mentioned a similar issue with the Redis code and wonders if it is related to another line of code.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository, and we appreciate your understanding as we work to manage and prioritize the open issues. If you have any further questions or concerns, please let us know.

dosubot[bot] avatar Oct 17 '23 16:10 dosubot[bot]

Can't check right now, but maybe this has been resolved by the merge of #6570?

NicoWeio avatar Oct 17 '23 19:10 NicoWeio

@baskaryan Could you please help @NicoWeio with this issue? They have indicated that it may still be relevant and mentioned the possibility of it being resolved by the merge of #6570. Thank you!

dosubot[bot] avatar Oct 17 '23 19:10 dosubot[bot]

Hi, @NicoWeio

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with the similarity_search_with_relevance_scores function in ChromaDB returning incorrect values, and there were discussions about potential fixes and related issues with Redis code. I requested confirmation of the issue's relevance to the latest repository version, and there was mention of a potential resolution through the merge of #6570, with dosubot requesting assistance from baskaryan to address the issue.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

dosubot[bot] avatar Feb 02 '24 16:02 dosubot[bot]