langchain [Vectorscore Feature] Add a new search type to allow search based on a relevance score threshold

Adding a new search type to allow search based on a relevance score threshold

Changes:

add a new search_type similarity_score_threshold to allow the vectorstore to call similarity_search_with_relevance_scores function; also updated the input validation check function
use warnings instead of "raise Exception" in similarity_search_with_relevance_scores to make the pipeline more robust if using ConversationalQAChain
adding score_threshold argument in similarity_search_with_relevance_scores to complete the filtering logic and add a warning if not relevant docs are returned based on the threshold filtering.

Motivation:

From my use cases and experimentation, I found finding relevant documents to ingest into the prompt is very important, especially when a user didn't ask relevant question, we don't want to pass all the retrieved docs to downstream and embedded into the question prompt.
Using the similarity_search_with_relevance_scores function usually provides better visibility and debugging capability
The existing ConversationalQAChain is using the similarity_search by default and hard to customize it unless making the underlying changes to choose a different search method.

Looking for review and feedback

Apr 27 '23 21:04 jpzhangvincent

I am facing similar situation with off topic conversations as described issue

When I printed the value of similarity scores from the similarity_search revealed that the similarity always ranges between 3.1 to 4.1 for both relevant /irrelevant responses. Configuring retreiver with (search_type="similarity", search_kwargs={"k":2}) also doesnt help the situation much. Please suggest if there is anyway to use the search relevancy scores to influence the conversation on non relevant topics.

May 02 '23 10:05 My3VM

I am facing similar situation with off topic conversations as described issue

When I printed the value of similarity scores from the similarity_search revealed that the similarity always ranges between 3.1 to 4.1 for both relevant /irrelevant responses. Configuring retreiver with (search_type="similarity", search_kwargs={"k":2}) also doesnt help the situation much. Please suggest if there is anyway to use the search relevancy scores to influence the conversation on non relevant topics.

Hey @My3VM the PR aims to provide an approach to address this problem by calling the similarity_search_with_relevance_scores and filtering the relevant docs by a specified threshold. If you want to try it out, you can switch to my branch and reinstall, using the example code snippet below to initialize your chain - (basically, adding "score_threshold": 0.5 in the search_kwargs argument

from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

# set up your own vector db
vectordb = ...

memory = ConversationBufferWindowMemory(
    memory_key="chat_history",  # important to align with agent prompt (below)
    k=3,
    return_messages=True
)

llm=ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    temperature=0,
    model_name='gpt-3.5-turbo'
)

conv_retriever = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_type='similarity_score_threshold', search_kwargs={"k": 2, "score_threshold": 0.5}),
    memory=memory
)

May 02 '23 22:05 jpzhangvincent

Is this merged with Langchain 0.0.170 yet? I am still getting errors for search_type='similarity_score_threshold'

May 17 '23 04:05 My3VM

Is this merged with Langchain 0.0.170 yet? I am still getting errors for search_type='similarity_score_threshold'

Did you use the async version? There's another PR for that

May 19 '23 15:05 jpzhangvincent

I created an extra PR to fix some missing async calls, please check your repo @jpzhangvincent

May 22 '23 20:05 Morriz

bah, this was closed...will make the PR against master

May 22 '23 20:05 Morriz

langchain langchain copied to clipboard

[Vectorscore Feature] Add a new search type to allow search based on a relevance score threshold

langchain
langchain copied to clipboard