langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Add new chain qa_with_references, inspired by qa_with_sources.

Open pprados opened this issue 1 year ago • 1 comments

Description:

This chain returns the selected documents used for the response. Then, it's possible to extract all meta-data for these documents, to generate a markdown link or other stuff.

It's not easy to manage the links of sources. Langchain proposed a qa_with_source, but this chain returns only a list of URL. It's not possible to extract other informations about the metadata, like the filename or others stuff.

I propose a solution to this problem: return the Documents, and let development do what it wants with it.

This pull-request proposes a new implementation of the original chain qa_with_source. Now, the implementation qa_with_reference returns the list of documents used for the response. Then, it's possible to use all meta-data for these documents, to generate a Markdown link (with filename/title and URL) or what you want.

To do that, I inject a UUID in the meta-data, and ask to return a list of UUID in the response. Then, it's easy to find the corresponding document.

In another branch, I have reimplemented the origin qa_with_source to extend the qa_with_reference.

Tag maintainer:

  • @baskaryan

Dependencies:

  • no

Twitter handle:

  • @pprados

pprados avatar Jul 06 '23 14:07 pprados

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 10, 2023 8:14am

vercel[bot] avatar Jul 06 '23 14:07 vercel[bot]

This returns citations:

from langchain.chains import RetrievalQAWithSourcesChain
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever())
result = qa_chain({"question": question})

This returns source documents:

from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
                                       return_source_documents=True)
result = qa_chain({"query": question})
print(len(result['source_documents']))
result['source_documents'][0]

Do those cover the use-case you need here?

No. The RetrievalQA or RetrievalQAWithSourceChain return only the url of all documents used. But,

  • some documents have no URL
  • not all documents are used
  • Others metadata can be important to manipulate the response.

With my proposition (next version soon), return only the documents used to answer, and, if it's possible, the verbatim used to answer. And, it's possible to manipulate all the information about these documents (Google file id, records id from DB, etc.)

pprados avatar Aug 02 '23 12:08 pprados

Dear @rlancemartin,

I have made several changes to the code for this new release.

Before explaining the changes, let me provide some context. With RetrievalQAWithSourcesChain or RetrievalQA, it was not possible to identify only the documents used, as the parameter return_source_documents=True would return all documents, even those not used for the response. Additionally, with RetrievalQAWithSourcesChain, only the metadata["source"] was returned, making it difficult to calculate the link between the URL and the corresponding document, especially when multiple documents shared the same URL. And, it's possible ONLY if the metadata has source.

That's why I'm proposing this new version (which fixes a bug in RetrievalQAWithSourcesChain when the recursive process is necessary).

New release:

  • To optimize token usage, I have stopped using UUIDs (UUIDs consume tokens).
  • Instead, I now use an index from the list of documents, which is shorter.
  • The code is now compatible with asynchronous operations.
  • I have also updated the documentation in docs/extras/use_cases/question_answering/index.mdx.
  • When using chain_type='stuff', all documents are merged into the same prompt, and only the documents used are returned.
  • For chain_type='map_reduce', the verbatim is extracted from the document to answer and added to the metadata. This allows searching for the portion of text used for the response. If the list of documents is large, the recursive approach is used, and the verbatim may not be extracted.
  • chain_type='refine' cannot extract the verbatim, but it only returns the documents used.
  • Lastly, chain_type='map_rerank' returns only the best document.
  • I fixed a bug in llm.py related to empty document lists.
  • Now, it's possible to create a link, with the source and the text fragment to highlight the portion of page (url with #:~:text=...).

I hope you find these changes beneficial.

Later, if you accept my code, it will be possible to revise RetrievalQAWithSourcesChain to inherit RetrievalQAWithReferencesChain.

pprados avatar Aug 02 '23 13:08 pprados

Note that another difference with qa_with_source is the consumption of tokens. When you add a URL for each document or aggregate a list of URLs, especially if you have many documents, it can consume a significant number of tokens. In my proposition, only an integer is used to identify the corresponding document. This approach can be more efficient when dealing with a big list of documents, allowing for better performance.

pprados avatar Aug 03 '23 14:08 pprados

Hello @rlancemartin, I have a "1 change requested", but i can not change something. Is the process blocked?

pprados avatar Aug 04 '23 13:08 pprados

Hello @rlancemartin, I have a "1 change requested", but i can not change something. Is the process blocked?

@pprados thanks for the detailed explanation.

Also adding @baskaryan and @eyurtsev who may have a point of view.

AFAICT, the problem you want to solve:

  • return the exact source documents used for answer generation in RAG.

Limitations today:

  • return_source_documents=True returns all source docs retrieved
  • RetrievalQAWithSourcesChain returns citations (IIUC also for all source docs retrieved?)

Question 1:

  • What is the precise problem with RetrievalQAWithSourcesChain?
  • Can we put up a PR that addresses it narrowly?
  • This should resolve your use case, IIUC.

Approach:

  • You use map_reduce or refine chains w/ intermediate_steps=True
  • "verbatim" of the docs is retrieved from the answers["intermediate_steps"]
  • Thus, the question answer for each document is used

Question 2:

  • This seems to be wrapping MR or Refine chains intermediate_steps, which is the question answer per doc.
  • That is logged to verbatim.
  • But additional work is needed to compare each to the final answer to see which docs are most related.

In short, the premise is good: higher quality citations for RAG generations. The problem w/ RetrievalQAWithSourcesChain is not fully obvious to me, but I also have not worked with it extensively. A crisp overview of that problem and PR to address it narrowly seems reasonable. It seems you found a bug there? And IIUC, it will return citations of all doc?

Use of MR or Refine intermediate_steps is also not entirely obvious to me as a solution to the problem: as you show in the notebook, you will get an answer per document for the question (logged to 'verbatim') but it's still not fully obvious how each answer maps to the final answer without some additional work (e.g., semantic similarity, etc) that will add further overhead.

rlancemartin avatar Aug 10 '23 18:08 rlancemartin

Question 1:

What is the precise problem with RetrievalQAWithSourcesChain?

  • The challenge lies in returning the precise source documents employed for generating answers in RAG.
  • RetrievalQAWithSourcesChain solely provides URLs (requiring documents to possess URLs) without additional information.
  • Its structure is defined by the implementation, manipulating a list of URLs rather than a list of referenced documents.
  • My implementation establishes a direct correlation between references and associated documents.

Can we put up a PR that addresses it narrowly?

  • Yes, although the PR can only be submitted after acceptance.
  • A potential solution involves reconfiguring RetrievalQAWithSourcesChain as a derivative of my code. This simplified approach involves using RetrievalQAWithReference to acquire the list of original documents and extracting solely the URLs. This implementation proves advantageous as it reduces token consumption (replacing each URL in the list with a numerical value).

Approach:

You use map_reduce or refine chains w/ intermediate_steps=True

  • Indeed, this choice is driven by the capability of map-reduce mode to extract the relevant document portions. However, accessing this information requires setting intermediate_steps=True. My implementation repurposes the code from RetrievalQAWithSourcesChain. The mapping process employs a Language Model (LLM) to extract a focused section from each document, along with the verbatim answer. This verbatim answer, at times "nothing" and at others more substantive, is then employed in the reduction phase to formulate the final answer. However, it is unnecessary to incorporate the verbatim into the response body, as its retrieval is possible with intermediate_steps=True.

"verbatim" of the docs is retrieved from the answers["intermediate_steps"]

  • Yes, this corresponds to the same approach employed in RetrievalQAWithSourcesChain.

Question 2:

This seems to be wrapping MR or Refine chains intermediate_steps, which is the question answer per doc. That is logged to verbatim. But additional work is needed to compare each to the final answer to see which docs are most related.

  • Not exactly. The reduction phase selects the corresponding document (the exact document from which the verbatim was extracted - usually, a simple text search can locate this verbatim in the original document). It is the role of the Language Model to answer the question of which document corresponds to the question-answer pair per document. This is analogous to RetrievalQAWithSourcesChain, albeit using an ID instead of a URL.

It seems you found a bug there?

  • Yes, the current implementation of RetrievalQAWithSourcesChain struggles to handle a list of URLs for each document. Yet, when utilizing map-reduce, an initial reduction step can be applied. This step employs multiple documents and yields a partial response containing a list of documents. This response then becomes a new document for further analysis, and another reduce phase. Consequently, with a substantial list of documents, using a partial reduction can result in the loss of the complete URL list (the templace accept only one url by document). However, the primary issue here pertains to the number of tokens consumed in this scenario (an URL like from GoogleDoc, are not LLM friendly).

I believe hallucinations pose a significant issue. Enabling straightforward referencing of what led to the answer's derivation is imperative. Relying solely on a list of documents, some utilized and some not, is an inadequate approach.

pprados avatar Aug 11 '23 08:08 pprados

Question 1:

What is the precise problem with RetrievalQAWithSourcesChain?

  • The challenge lies in returning the precise source documents employed for generating answers in RAG.
  • RetrievalQAWithSourcesChain solely provides URLs (requiring documents to possess URLs) without additional information.
  • Its structure is defined by the implementation, manipulating a list of URLs rather than a list of referenced documents.
  • My implementation establishes a direct correlation between references and associated documents.

Can we put up a PR that addresses it narrowly?

  • Yes, although the PR can only be submitted after acceptance.
  • A potential solution involves reconfiguring RetrievalQAWithSourcesChain as a derivative of my code. This simplified approach involves using RetrievalQAWithReference to acquire the list of original documents and extracting solely the URLs. This implementation proves advantageous as it reduces token consumption (replacing each URL in the list with a numerical value).

Approach:

You use map_reduce or refine chains w/ intermediate_steps=True

  • Indeed, this choice is driven by the capability of map-reduce mode to extract the relevant document portions. However, accessing this information requires setting intermediate_steps=True. My implementation repurposes the code from RetrievalQAWithSourcesChain. The mapping process employs a Language Model (LLM) to extract a focused section from each document, along with the verbatim answer. This verbatim answer, at times "nothing" and at others more substantive, is then employed in the reduction phase to formulate the final answer. However, it is unnecessary to incorporate the verbatim into the response body, as its retrieval is possible with intermediate_steps=True.

"verbatim" of the docs is retrieved from the answers["intermediate_steps"]

  • Yes, this corresponds to the same approach employed in RetrievalQAWithSourcesChain.

Question 2:

This seems to be wrapping MR or Refine chains intermediate_steps, which is the question answer per doc. That is logged to verbatim. But additional work is needed to compare each to the final answer to see which docs are most related.

  • Not exactly. The reduction phase selects the corresponding document (the exact document from which the verbatim was extracted - usually, a simple text search can locate this verbatim in the original document). It is the role of the Language Model to answer the question of which document corresponds to the question-answer pair per document. This is analogous to RetrievalQAWithSourcesChain, albeit using an ID instead of a URL.

It seems you found a bug there?

  • Yes, the current implementation of RetrievalQAWithSourcesChain struggles to handle a list of URLs for each document. Yet, when utilizing map-reduce, an initial reduction step can be applied. This step employs multiple documents and yields a partial response containing a list of documents. This response then becomes a new document for further analysis, and another reduce phase. Consequently, with a substantial list of documents, using a partial reduction can result in the loss of the complete URL list (the templace accept only one url by document). However, the primary issue here pertains to the number of tokens consumed in this scenario (an URL like from GoogleDoc, are not LLM friendly).

I believe hallucinations pose a significant issue. Enabling straightforward referencing of what led to the answer's derivation is imperative. Relying solely on a list of documents, some utilized and some not, is an inadequate approach.

Thanks for the detailed explanation. Let's merge this in langchain_experimental, so please move files here. There are some interesting ideas here that it would be great to allow folks to experiment further with them.

rlancemartin avatar Aug 11 '23 16:08 rlancemartin

@rlancemartin It's done. Now the code is in langchain_experimental for the moment. I hope it can migrate to the standard branch later. Then it will be possible to optimize and fix qa_with_source.

pprados avatar Aug 16 '23 11:08 pprados

I add a TU to validate the code in experimental branch.

pprados avatar Aug 17 '23 14:08 pprados

Hello @rlancemartin, @hwchase17,

I have just published an update to better identify verbatim of the original document, and I had some documentations. Code migrated to experimental, with unit-test.

Can you take the time to validate this pull-request ?

I hope it will take less time than the pull-request on Google Drive ;-)

pprados avatar Aug 22 '23 06:08 pprados

Hello @rlancemartin,

I updated the code to extract all verbatim from the original document. You can see a clear sample here.

I hope to be integrated into the experimental branch soon.

pprados avatar Aug 29 '23 07:08 pprados

@rlancemartin, The lint fail with other experimental components. See this declared bug 10088. They are incompatible with the latest version of langchain.

pprados avatar Sep 05 '23 08:09 pprados

@baskaryan, @rlancemartin, (I changed the description for this new version)

Question answering with references and verbatim over documents.

We believe that hallucinations pose a major problem in the adoption of LLMs (Language Model Models). It is imperative to provide a simple and quick solution that allows the user to verify the coherence of the answers to the questions they are asked.

This chain extracts the information from the documents that was used to answer the question. The output source_documents contains only the documents that were used, and for each one, only the text fragments that were used to answer are included. If possible, the list of text fragments that justify the answer is added to metadata['verbatims'] for each document. Then it is possible to find the page of a PDF, or the chapter of a markdown.

A sample result of usage with Wikipedia, may be:

For the question "what can you say about ukraine?",
to answer "Ukraine has an illegal and unprovoked invasion inside its territory,
and its citizens show courage and are fighting against it. It is believed that
the capital city of Kyiv, which is home to 2.8 million people, is a target.",
the LLM use:
Source https://www.defense.gov/News/Transcripts/...
-  "when it comes to conducting their illegal and unprovoked invasion inside
Ukraine."
Source https://www.whitehouse.gov/briefing-room/...
- "to the fearless and skilled Ukrainian fighters who are standing in the
breach"
- "You got to admit, you have -- must be amazed at the courage of this country"
Source https://www.whitehouse.gov/briefing-room/...
-  "believe that they will target Ukraine's capital, Kyiv, a city of 2.8 million
innocent people."
Source https://www.whitehouse.gov/briefing-room/...
-  "believe that they will target Ukraine's capital, Kyiv, a city of 2.8 million
innocent people."

pprados avatar Sep 05 '23 08:09 pprados

@pprados Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!

leo-gan avatar Sep 18 '23 23:09 leo-gan

Hello @leo-gan

Thank you for considering my proposal.

For the moment, it is on the experimental branch, but I am hopeful of a deeper integration into langchain. Then, it will be possible to add 2 methods to VectorStoreIndexWrapper: query_with_references() and query_with_references_and_verbatims()

Some remarks to understand the code.

  • It is a fork of qa_with_sources. I resynchronized my code with the latest version.
  • It is not very clear whether it is possible/advisable to use parsers from templates. The current implementations of ReduceDocumentsChain and MapReduceDocumentsChain do not use prompt parsers, and that's a shame.
  • MapRerankDocumentsChain use apply_and_parse() but it's deprecated. I would adjust the code in sync with qa_with_sources.
  • An evolution of these 3 classes will be welcome and would simplify my code. I can do it, but i would be sure it's what you want: use the associated parser for eachs prompts.

For qa_with_reference, I propose 2 implementations, one based on Pydantic objects, the other without in order to save token consumption. This is the version that is activated.

For qa_with_reference_and_verbatims, I use a Pydantic objects and the corresponding parser.

Integration tests require openai and wikipedia (pip install openai tiktoken wikipedia). You can see the different results with qa_with_sources, qa_with_references and qa_with_sources_and_verbatims.

Sometime, qa_with_sources in mode map_rerank crash. Because the regex pattern is '(.?)\nScore: (\d)' but sometime, the LLM returns the score, without the `\n'. It may be fixed, but I don't want to create confusion.

After validation, it will be possible to review qa_with_sources for a compatible implementation, but more economical in tokens, without bugs in the case of a large list of documents (bug because it's impossible to manage a long list of URL, and recursive map-reduce is not prepared to use multiple URLs for the same document). In addition, the new version will only be able to find the list of documents used for the response (and not the list of documents returned by the vectorstore. It is not the same thing - some documents are not really used).

pprados avatar Sep 19 '23 09:09 pprados

@baskaryan PR is ready for review

leo-gan avatar Sep 20 '23 17:09 leo-gan

@rlancemartin Please, review this PR again. TNX!

leo-gan avatar Sep 21 '23 15:09 leo-gan

@rlancemartin Please, review this PR again. TNX!

I'm on paternity leave currently (new baby). Thanks for looking into this. I can have a look when I'm back in October unless others have a chance. This has been moved to experimental IIRC, so we can probably get it merged without major concern.

rlancemartin avatar Sep 21 '23 15:09 rlancemartin

@baskaryan Regarding to @rlancemartin this PR can be merged. :gift:

leo-gan avatar Sep 21 '23 17:09 leo-gan

I just resynchronize the code. @baskaryan, can you merge this PR ?

pprados avatar Sep 25 '23 08:09 pprados

@baskaryan Could you, please, review it?

leo-gan avatar Sep 25 '23 15:09 leo-gan

@baskaryan, @leo-gan, @baskaryan Someone, can you review this code, please? I hope I don't have the same problem as with the Google drive integration ;-)

pprados avatar Sep 26 '23 14:09 pprados

@hwchase17 Could you, please, review it?

leo-gan avatar Sep 26 '23 16:09 leo-gan

@baskaryan, may be, you can review the code?

pprados avatar Oct 04 '23 08:10 pprados

any update on this PR?

timxieICN avatar Oct 10 '23 18:10 timxieICN

Hello @rlancemartin, @hwchase17, @timxieICN, Yes. I rebase the code. All checks have passed. I wait the review.

pprados avatar Oct 12 '23 09:10 pprados

Thanks @pprados - I took a quick look at your codes - looks great.

~~However, I did notice your implementation on QAWithReferencesAndVerbatimsChain deviates quite a bit from the rest of LangChain's qa_chain or ConversationalRetrievalChain. For example, there's no argument/concept of retriever anymore.~~

Never mind, I see you also have RetrievalQAWithReferencesAndVerbatimsChain implemented.

timxieICN avatar Oct 16 '23 16:10 timxieICN

@pprados - also, what's the technical reason why verbatim only works for chain_type='map_reduce'?

timxieICN avatar Oct 16 '23 16:10 timxieICN

@pprados - I tried to run your code by refactoring it into our existing workflow, as follows:

query_executor = RetrievalQAWithReferencesAndVerbatimsChain.from_llm(
                llm=AzureChatOpenAI(temperature=0),
                retriever=OpenSearchVectorStore,
                memory=...,
                chain_type="map_reduce"
)

I noticed:

  1. The code is quite slow. For QA on a 8-page long pdf, the query can easily take up to 30 sec.
  2. The response is not always optimal. The response is not as accurate as what the ConversationalRetrievalChain provides.

Is this what you've seen as well?

timxieICN avatar Oct 16 '23 19:10 timxieICN