langchain
langchain copied to clipboard
ChromaDB score goes the wrong way
System Info
langchain==0.0.227 langchainplus-sdk==0.0.20 chromadb==0.3.26
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
I'm sorry, but I don't have the time to carve out a MRE right now. My take is that it's still better to report it than not to.
Expected behavior
similarity_search_with_score returns the distance as expected, but similarity_search_with_relevance_scores gives the same values, so that the closest distances return the smallest values, even though the output of the latter function is supposed to be higher for vectors that are closer:
similarity_search_with_relevance_scoresReturn docs and relevance scores in the range [0, 1]. 0 is dissimilar, 1 is most similar.
I just noticed the same while trying to figure out https://github.com/hwchase17/langchain/issues/7427, the code should do a 1 - score
Hi, @NicoWeio. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the issue you reported is related to the similarity_search_with_relevance_scores function in ChromaDB. It seems that the function is returning incorrect values, with smaller values being returned for closer vectors instead of higher values. User paplorinc has suggested a potential fix by using 1 - score in the code. Additionally, another user named DavidArenburg has mentioned a similar issue with the Redis code and wonders if it is related to another line of code.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository, and we appreciate your understanding as we work to manage and prioritize the open issues. If you have any further questions or concerns, please let us know.
Can't check right now, but maybe this has been resolved by the merge of #6570?
@baskaryan Could you please help @NicoWeio with this issue? They have indicated that it may still be relevant and mentioned the possibility of it being resolved by the merge of #6570. Thank you!
Hi, @NicoWeio
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with the similarity_search_with_relevance_scores function in ChromaDB returning incorrect values, and there were discussions about potential fixes and related issues with Redis code. I requested confirmation of the issue's relevance to the latest repository version, and there was mention of a potential resolution through the merge of #6570, with dosubot requesting assistance from baskaryan to address the issue.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!