langchain
langchain copied to clipboard
RetrievalQAWithSourcesChain provides unreliable sources
System Info
System Info
-
langchain.__version__
is0.0.184
- Python 3.11
- Mac OS Ventura 13.3.1(a)
Who can help?
@hwchase17
Summary
The sources component of the output of RetrievalQAWithSourcesChain
is not providing transparency into what documents the retriever returns, it is instead some output that the llm contrives.
Motivation
From my perspective, the primary advantage of having visibility into sources is to allow the system to provide transparency into the documents that were retrieved in assisting the language model to generate its answer. Only after being confused for quite a while and inspecting the code did I realize that the sources were just being conjured up.
Advice
I think it is important to ensure that people know about this, as maybe this isn't a bug and is more documentation-related, though I think the docstring should be updated as well.
Notes
Document Retrieval Works very well.
It's worth noting that in this toy example, the combination of FAISS
vector store and the OpenAIEmbeddings
embeddings model are doing very reasonably, and are deterministic.
Recommendation
Add caveats everywhere. Frankly, I would never trust using this chain. I literally had an example the other day where it wrongly made up a source and a wikipedia url that had absolutely nothing to do with the documents retrieved. I could supply this example as it is a way better illustration of how this chain will hallucinate sources because they are generated by the LLM, but it's just a little bit more involved than this smaller example.
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [X] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
Demonstrative Example
Here's the simplest example I could come up with:
1. Instantiate a vectorstore
with 7 documents displayed below.
>>> from langchain.vectorstores import FAISS
>>> from langchain.embeddings import OpenAIEmbeddings
>>> from langchain.llms import OpenAI
>>> from langchain.chains import RetrievalQAWithSourcesChain
>>> chars = ['a', 'b', 'c', 'd', '1', '2', '3']
>>> texts = [4*c for c in chars]
>>> metadatas = [{'title': c, 'source': f'source_{c}'} for c in chars]
>>> vs = FAISS.from_texts(texts, embedding=OpenAIEmbeddings(), metadatas=metadatas)
>>> retriever = vs.as_retriever(search_kwargs=dict(k=5))
>>> vs.docstore._dict
{'0ec43ce4-6753-4dac-b72a-6cf9decb290e': Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
'54baed0b-690a-4ffc-bb1e-707eed7da5a1': Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
'85b834fa-14e1-4b20-9912-fa63fb7f0e50': Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'}),
'06c0cfd0-21a2-4e0c-9c2e-dd624b5164fe': Document(page_content='dddd', metadata={'title': 'd', 'source': 'source_d'}),
'94d6444f-96cd-4d88-8973-c3c0b9bf0c78': Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'}),
'ec04b042-a4eb-4570-9ee9-a2a0bd66a82e': Document(page_content='2222', metadata={'title': '2', 'source': 'source_2'}),
'0031d3fc-f291-481e-a12a-9cc6ed9761e0': Document(page_content='3333', metadata={'title': '3', 'source': 'source_3'})}
2. Instantiate a RetrievalQAWithSourcesChain
The return_source_documents
is set to True
so that we can inspect the actual sources retrieved.
>>> qa_sources = RetrievalQAWithSourcesChain.from_chain_type(
OpenAI(),
retriever=retriever,
return_source_documents=True
)
3. Example Question
Things look sort of fine, meaning 5 documents are retrieved by the retriever
, but the model only lists only a single source.
qa_sources('what is the first lower-case letter of the alphabet?')
{'question': 'what is the first lower-case letter of the alphabet?',
'answer': ' The first lower-case letter of the alphabet is "a".\n',
'sources': 'source_a',
'source_documents': [Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'}),
Document(page_content='dddd', metadata={'title': 'd', 'source': 'source_d'}),
Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'})]}
4. Second Example Question containing the First Question.
This is not what I would expect, considering that this question contains the previous question, and that the vector store did supply the document with {'source': 'source_a'}
, but for some reason (i.e. the internals of the output of OpenAI()
) in this response from the chain, there are zero sources listed.
>>> qa_sources('what is the one and only first lower-case letter and number of the alphabet and whole number system?')
{'question': 'what is the one and only first lower-case letter and number of the alphabet and whole number system?',
'answer': ' The one and only first lower-case letter and number of the alphabet and whole number system is "a1".\n',
'sources': 'N/A',
'source_documents': [Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'}),
Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
Document(page_content='2222', metadata={'title': '2', 'source': 'source_2'}),
Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'})]}
Expected behavior
I am not sure. We need a warning, perhaps, every time this chain is used, or some strongly worded documentation for our developers.