langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Facing issue when using arun with VectorDBQAWithSourcesChain chain

Open dheerajiiitv opened this issue 1 year ago • 17 comments

Using VectorDBQAWithSourcesChain with arun, facing below issue ValueError: run not supported when there is not exactly one output key. Got ['answer', 'sources'].

dheerajiiitv avatar Mar 21 '23 05:03 dheerajiiitv

same

digi604 avatar Mar 21 '23 22:03 digi604

+1

I'm getting just KeyError: 'source'

benthecoder avatar Mar 23 '23 17:03 benthecoder

same: not supported when there is not exactly one output key. Got ['answer', 'sources']. Although I'm using the chain as a tool so maybe I didn't set something correctly

ibryane avatar Mar 30 '23 17:03 ibryane

I'm facing the same technical issue. My code works fine with VectorDBQA, but not with the source one.

mystvearn avatar Mar 31 '23 14:03 mystvearn

+1

commissarster avatar Apr 03 '23 10:04 commissarster

        vector_db = Milvus.from_texts(
            docs,
            AzureEmbedding
        )
        text_field = vector_db.text_field
        collection_name = vector_db.collection_name

        docs = vector_db.similarity_search(qa_req.query)

        chain = load_qa_with_sources_chain(AzureLLM, chain_type="stuff")
        result = chain({"input_documents": docs, "question": qa_req.query}, return_only_outputs=True)

2023-04-04 21:32:14,229 - ERROR - chat_completion error: 'source' Traceback (most recent call last): File "D:\go_path\src\kgpt\kgpt_engine\routers\chat_routers.py", line 53, in chat_qa qa = chatService.qa(qa_req) File "D:\go_path\src\kgpt\kgpt_engine\service\chat_service.py", line 84, in qa result = chain({"input_documents": docs, "question": qa_req.query}, return_only_outputs=True) File "D:\python\py3.9.6\lib\site-packages\langchain\chains\base.py", line 116, in call raise e File "D:\python\py3.9.6\lib\site-packages\langchain\chains\base.py", line 113, in call outputs = self._call(inputs) File "D:\python\py3.9.6\lib\site-packages\langchain\chains\combine_documents\base.py", line 56, in _call output, extra_return_dict = self.combine_docs(docs, **other_keys) File "D:\python\py3.9.6\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 87, in combine_docs inputs = self._get_inputs(docs, **kwargs) File "D:\python\py3.9.6\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 64, in _get_inputs document_info = { File "D:\python\py3.9.6\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 65, in k: base_info[k] for k in self.document_prompt.input_variables KeyError: 'source'

same problem

SpecialMatthew avatar Apr 04 '23 13:04 SpecialMatthew

+1

Same Issue with RetrievalQAWithSourcesChain

rbmdotdev avatar Apr 05 '23 16:04 rbmdotdev

This is probably an error because RetrievalQAWithSourcesChain doesn't have a run method like some other chains. It would be great if it does because that would make it easier for people to use it as a tool for an agent.

ibryane avatar Apr 05 '23 16:04 ibryane

Still having this issue!!!!!

rl3250 avatar Apr 07 '23 01:04 rl3250

File "/Users/alfredwahlforss/Documents/projects/merlin/ask-me-anything/.venv/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 64, in _get_inputs document_info = { File "/Users/alfredwahlforss/Documents/projects/merlin/ask-me-anything/.venv/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in k: base_info[k] for k in self.document_prompt.input_variables KeyError: 'source'

wahlforss avatar Apr 07 '23 05:04 wahlforss

One solution is to pass the RetrievalQAWithSourcesChain into the func parameter of Tool without calling .run()

Example

exampleBook = RetrievalQAWithSourcesChain.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

tools = [
    Tool(
        name = "ExampleBook",
        func=exampleBook,  **// exampleBook instead of exampleBook.run()**
        description="useful for when you need to answer questions about the exampleBook. Input should be a fully formed question."
    ),
]



memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

agent = initialize_agent(tools, llm, agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, verbose=True, memory=memory)

thorgexyz avatar Apr 10 '23 11:04 thorgexyz

Was having this same issue, fixed it with @thorgexyz's solution.

jakesteelman avatar Apr 13 '23 21:04 jakesteelman

VectorDBQAWithSourcesChain

could you provide more information about your program, especially regarding AzureLLM and the PromptTemplate. i think the KeyError 'source' is because your prompt template missing the key.

rchanggogogo avatar Apr 15 '23 05:04 rchanggogogo

If you are getting KeyError: 'source' it's because the documents in your vectorstore don't have the source field in it's metadata. Add this to your payloads/metadata of your vectorstore data and it should resolve the issue. You can see in the documentation example that they are adding a source field

https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa_with_sources.html

enkoder avatar Apr 19 '23 01:04 enkoder

Was having this issue with error of 'run' not supported when there is not exactly one output key. Got ['output_text', 'source']. Using the following code:

qna = RetrievalQAWithSourcesChain.from_chain_type(
     llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
resp = qna({"question": message}, return_only_outputs=True)

I tried @thorgexyz solution but then was getting an error of: Saving not supported for this chain type. (which is weird because all the chain types but map_rerank implement the needed property 🤷 .

I was able to get around the limitation by querying for the documents manually rather than passing the retriver into the RetrievalQAWithSourcesChain.from_chain_type function:

docs = docsearch.similarity_search(message, k=5)

chain = load_qa_with_sources_chain(llm, chain_type="stuff", metadata_keys=['source'])

resp = chain({"input_documents": docs, "question": message}, return_only_outputs=True)

crwgregory avatar Apr 20 '23 01:04 crwgregory

With the following code

index_name = 'test'
docsearch = Pinecone.from_existing_index(index_name, embeddings,text_key = 'content')
llm = OpenAI(model_name = "gpt-3.5-turbo",temperature=0)
param_similarity=dict(k=3,search_type="similarity")
retriever = docsearch.as_retriever(search_kwargs=param_similarity)


qa_chain = load_qa_with_sources_chain(
    llm = llm,
    chain_type = "stuff",
    prompt=QAPROMPT,
    return_intermediate_steps=True,
    verbose = True
)
chain = RetrievalQAWithSourcesChain(
    combine_documents_chain=qa_chain, 
    retriever=retriever)
    return_source_documents=True)

having the error

ValueError: Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source'].

Thus inserted -> metadata_keys=['id'] in the load_qa_with_sources_chain because didn't want to change metada of my vectors

But now i'm having the following error ->

ValidationError: 1 validation error for StuffDocumentsChain metadata_keys extra fields not permitted (type=value_error.extra)

Any idea @crwgregory ? Many thanks in advance

eloijoub avatar Apr 28 '23 11:04 eloijoub

@eloijoub Hard to say, I'm no expert. What I had to do was save the data in my vector store with a source metadata key. I'd suggest you re-insert your documents with a source tag set to your id value. Langchain is expecting the source. I'm also passing a lot of other metadata, but I think the source might be required. It depends on what loader you are using how you can configure that.

Seems like you might be experiencing this issue: https://github.com/hwchase17/langchain/issues/1844#issuecomment-1514015504

crwgregory avatar Apr 29 '23 10:04 crwgregory

@crwgregory

I don't fully understand your solution. Are you using an agent and are running resp = chain({"input_documents": docs, "question": message}, return_only_outputs=True) as the func of the agent?

Also this is confusing me, so you have to create the agent again for every user input? So I need to do: docs = docsearch.similarity_search(user_input, k=5) first and create the agent with doc after that?

fabmeyer avatar Jul 13 '23 11:07 fabmeyer

Hi, @dheerajiiitv! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are facing an issue when using the "VectorDBQAWithSourcesChain" chain with "arun". You mentioned that you are getting a "ValueError" stating that the "run" command is not supported when there is not exactly one output key, but you are getting multiple output keys. Other users have reported similar issues and have provided potential solutions, such as passing the chain into the "func" parameter of Tool without calling .run() or adding a "source" field to the metadata of the vectorstore data.

If this issue is still relevant to the latest version of the LangChain repository, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain community!

dosubot[bot] avatar Oct 12 '23 16:10 dosubot[bot]