langchain
langchain copied to clipboard
Difference between "Question Answering with Sources" and "Question Answering"
I notice they use different API, but what's the difference between these 2 apis?
Question Answering: docs = docsearch.get_relevant_documents(query)
Question Answering with Sources: docs = docsearch.similarity_search(query)
(I am new to LangChain so please forgive any mistakes, just trying to help and learn at the same time. π)
Is this question about the subtle difference in "Prepare Data" section of these two notebooks?
- https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb
- https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb
In the "question_answering" notebook, docsearch is a VectorStoreRetriever:
docsearch = Chroma.from_texts(...).as_retriever()
In the "qa_with_sources" notebook, docsearch is a VectorStore
docsearch = Chroma.from_texts(...)
In these specific examples there is no difference, as the Chroma VectorStoreRetriever#get_relevant_documents() method simply proxies to self.vectorstore.similarity_search() unless the default search_type is overridden.
In general, it appears the intent may be for VectorStores which support multiple retrieval methods (in addition to vector similarity) to override as_retriever() in order to provide an implementation of get_relevant_documents() with all supported search_types.
Redis, for example: https://github.com/hwchase17/langchain/blob/e3cf00b/langchain/vectorstores/redis.py#L424-L457
Confirmed: https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/retrievers/examples/vectorstore-retriever.ipynb
(I am new to LangChain so please forgive any mistakes, just trying to help and learn at the same time. π)
Is this question about the subtle difference in "Prepare Data" section of these two notebooks?
- https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb
- https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb
In the "question_answering" notebook,
docsearchis aVectorStoreRetriever:
docsearch = Chroma.from_texts(...).as_retriever()In the "qa_with_sources" notebook,
docsearchis aVectorStore
docsearch = Chroma.from_texts(...)In these specific examples there is no difference, as the Chroma
VectorStoreRetriever#get_relevant_documents()method simply proxies toself.vectorstore.similarity_search()unless the defaultsearch_typeis overridden.In general, it appears the intent may be for
VectorStores which support multiple retrieval methods (in addition to vector similarity) to overrideas_retriever()in order to provide an implementation ofget_relevant_documents()with all supportedsearch_types.Redis, for example: https://github.com/hwchase17/langchain/blob/e3cf00b/langchain/vectorstores/redis.py#L424-L457
I noticed that "Question Answering with Sources" and "Retrieval Question Answering with Sources" use different chains: "load_qa_with_sources_chain" and "RetrievalQAWithSourcesChain". So, what's the difference between them? https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa_with_sources.html https://python.langchain.com/en/latest/modules/chains/index_examples/qa_with_sources.html
- To use
chain = load_qa_with_sources_chain(...), first you need to have an index/docsearch and forqueryget thedocs = docsearch.similarity_search(query)to usechain({"input_documents": docs, "question": query}. RetrievalQAWithSourcesChainis more compact version that does thedocsearch.similarity_searchetc. under the hood and has extra parameters likereduce_k_below_max_tokensandmax_tokens_limitto better control the token usage when doing retrieval.
is there any reason why the default prompt for QA with source is soooo huuuuge ? https://github.com/hwchase17/langchain/blob/master/langchain/chains/qa_with_sources/stuff_prompt.py
Or are we supposed to change it for our use case in any case ?
I am equally confused at this as well. The default prompt seems quite large for load_qa_with_sources_chain. I tried my own prompt template to override the prompt and at the very basic, it works, but I am looking to use Structured Output Parser with the load_qa_with_sources_chain, and the output is unreliable i.e. most of the time it just responds back with the default {'output_text': 'This is a response'} and a few other times it uses the ResponseSchema. I just can't make these two work together.
Actually, I also confused at this. The default prompt of load_qa_with_sources_chain is very different with load_qa_chain. In my task, I found the performance of load_qa_chain is better than load_qa_with_sources_chain. I guess the default prompt of load_qa_with_sources_chain make model consider more than one document. I am not sureπ
Hi, @19245222! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue was about the difference between the "Question Answering" and "Question Answering with Sources" APIs in the LangChain repository. The difference lies in the methods used for retrieving relevant documents: the former uses get_relevant_documents while the latter uses similarity_search. There have been comments providing further explanation and examples of how these methods are implemented in the code. Some users also expressed confusion about the default prompt and performance of the "Question Answering with Sources" API.
If this issue is still relevant to the latest version of the LangChain repository, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Let us know if you have any further questions or concerns. Thanks!