langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Difference between "Question Answering with Sources" and "Question Answering"

Open 19245222 opened this issue 2 years ago β€’ 6 comments
trafficstars

I notice they use different API, but what's the difference between these 2 apis?

Question Answering: docs = docsearch.get_relevant_documents(query)

Question Answering with Sources: docs = docsearch.similarity_search(query)

19245222 avatar Apr 18 '23 08:04 19245222

(I am new to LangChain so please forgive any mistakes, just trying to help and learn at the same time. πŸ˜„)

Is this question about the subtle difference in "Prepare Data" section of these two notebooks?

  • https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb
  • https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb

In the "question_answering" notebook, docsearch is a VectorStoreRetriever:

  • docsearch = Chroma.from_texts(...).as_retriever()

In the "qa_with_sources" notebook, docsearch is a VectorStore

  • docsearch = Chroma.from_texts(...)

In these specific examples there is no difference, as the Chroma VectorStoreRetriever#get_relevant_documents() method simply proxies to self.vectorstore.similarity_search() unless the default search_type is overridden.

In general, it appears the intent may be for VectorStores which support multiple retrieval methods (in addition to vector similarity) to override as_retriever() in order to provide an implementation of get_relevant_documents() with all supported search_types.

Redis, for example: https://github.com/hwchase17/langchain/blob/e3cf00b/langchain/vectorstores/redis.py#L424-L457

tsclausing avatar Apr 18 '23 14:04 tsclausing

Confirmed: https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/retrievers/examples/vectorstore-retriever.ipynb

tsclausing avatar Apr 18 '23 22:04 tsclausing

(I am new to LangChain so please forgive any mistakes, just trying to help and learn at the same time. πŸ˜„)

Is this question about the subtle difference in "Prepare Data" section of these two notebooks?

  • https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb
  • https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb

In the "question_answering" notebook, docsearch is a VectorStoreRetriever:

  • docsearch = Chroma.from_texts(...).as_retriever()

In the "qa_with_sources" notebook, docsearch is a VectorStore

  • docsearch = Chroma.from_texts(...)

In these specific examples there is no difference, as the Chroma VectorStoreRetriever#get_relevant_documents() method simply proxies to self.vectorstore.similarity_search() unless the default search_type is overridden.

In general, it appears the intent may be for VectorStores which support multiple retrieval methods (in addition to vector similarity) to override as_retriever() in order to provide an implementation of get_relevant_documents() with all supported search_types.

Redis, for example: https://github.com/hwchase17/langchain/blob/e3cf00b/langchain/vectorstores/redis.py#L424-L457

I noticed that "Question Answering with Sources" and "Retrieval Question Answering with Sources" use different chains: "load_qa_with_sources_chain" and "RetrievalQAWithSourcesChain". So, what's the difference between them? https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa_with_sources.html https://python.langchain.com/en/latest/modules/chains/index_examples/qa_with_sources.html

Lukangkang123 avatar Apr 21 '23 10:04 Lukangkang123

  • To use chain = load_qa_with_sources_chain(...), first you need to have an index/docsearch and for query get the docs = docsearch.similarity_search(query) to use chain({"input_documents": docs, "question": query}.
  • RetrievalQAWithSourcesChain is more compact version that does the docsearch.similarity_search etc. under the hood and has extra parameters like reduce_k_below_max_tokens and max_tokens_limit to better control the token usage when doing retrieval.

ehsanmok avatar Apr 22 '23 04:04 ehsanmok

is there any reason why the default prompt for QA with source is soooo huuuuge ? https://github.com/hwchase17/langchain/blob/master/langchain/chains/qa_with_sources/stuff_prompt.py

Or are we supposed to change it for our use case in any case ?

Qualzz avatar Apr 24 '23 23:04 Qualzz

I am equally confused at this as well. The default prompt seems quite large for load_qa_with_sources_chain. I tried my own prompt template to override the prompt and at the very basic, it works, but I am looking to use Structured Output Parser with the load_qa_with_sources_chain, and the output is unreliable i.e. most of the time it just responds back with the default {'output_text': 'This is a response'} and a few other times it uses the ResponseSchema. I just can't make these two work together.

annjawn avatar May 05 '23 04:05 annjawn

Actually, I also confused at this. The default prompt of load_qa_with_sources_chain is very different with load_qa_chain. In my task, I found the performance of load_qa_chain is better than load_qa_with_sources_chain. I guess the default prompt of load_qa_with_sources_chain make model consider more than one document. I am not sureπŸ˜‚

LoveFishoO avatar Jul 05 '23 08:07 LoveFishoO

Hi, @19245222! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue was about the difference between the "Question Answering" and "Question Answering with Sources" APIs in the LangChain repository. The difference lies in the methods used for retrieving relevant documents: the former uses get_relevant_documents while the latter uses similarity_search. There have been comments providing further explanation and examples of how these methods are implemented in the code. Some users also expressed confusion about the default prompt and performance of the "Question Answering with Sources" API.

If this issue is still relevant to the latest version of the LangChain repository, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Let us know if you have any further questions or concerns. Thanks!

dosubot[bot] avatar Oct 05 '23 16:10 dosubot[bot]