langchain icon indicating copy to clipboard operation
langchain copied to clipboard

`RetrievalQA` chain with chain_type `map_reduce` fails for custom prompts

Open amirgamil opened this issue 2 years ago • 2 comments

System Info

langchain 0.0.173 python 3.9.16

Who can help?

@hwchase17 @agola11 @vowelparrot

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [X] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [X] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [X] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction


custom_prompt_template = """Use the context to generate an appropriate reply to the query

Context: {context}
Query: {question}
Response:"""
CUSTOM_PROMPT = PromptTemplate(
    template=learn_mode_prompt_template, input_variables=[
        "context", "question"]
)

def generate_response(text: str, query: str): 
  retriever = create_document_vectorstore(
            page_text=text)
  chain_type_kwargs = {"prompt": CUSTOM_PROMPT }
  qa = RetrievalQA.from_chain_type(llm=OpenAI(
            openai_api_key=openai_api_key), chain_type="map_reduce", retriever=retriever)
  qa.run(body.query)

Expected behavior

tl;dr trying to use RetrievalQA chain with chain_type of map_reduce (and refine) errors out when using a custom prompt but successfully works with chain_type=stuff

Note this errors out with

ValidationError: 1 validation error for MapReduceDocumentsChain
prompt
  extra fields not permitted (type=value_error.extra)

however if chain_type is changed to stuff the code generates a completion without a problem

amirgamil avatar May 22 '23 14:05 amirgamil

qa_chain = RetrievalQA.from_chain_type(
    llm=llm_model,
    chain_type="stuff",
    retriever=retriever,
    #return_source_docummets=True,
)

i think return_source_docummets is problem

seohyunjun avatar May 28 '23 09:05 seohyunjun

i came to the same problem using ConversationalRetrievalChain. you may try this

chain_type_kwargs={"combine_prompt": CHAT_COMBINE_PROMPT, "question_prompt": QUESTION_PROMPT}

Willy-J avatar May 29 '23 01:05 Willy-J

i came to the same problem using ConversationalRetrievalChain. you may try this

chain_type_kwargs={"combine_prompt": CHAT_COMBINE_PROMPT, "question_prompt": QUESTION_PROMPT}

is it wored for you @Willy-J ?

Harish-Tricon avatar Jul 14 '23 09:07 Harish-Tricon

i came to the same problem using ConversationalRetrievalChain. you may try this

chain_type_kwargs={"combine_prompt": CHAT_COMBINE_PROMPT, "question_prompt": QUESTION_PROMPT}

is it wored for you @Willy-J ?

yes, it works on versin 0.0.179

Willy-J avatar Jul 19 '23 09:07 Willy-J

Okay thanks 👍. Let me try with the same version.

Harish-Tricon avatar Jul 24 '23 02:07 Harish-Tricon

I was able to fix this. you need to look, for each chain type (stuff, refine, map_reduce & map_rerank) for the correct input vars for each prompt.

Check the attached file, there I described the issue in detail. And how figured out the issue looking at the Langchain source code for the original/default prompt templates for each Chain type.

bing_chain_types.md

ghost avatar Aug 04 '23 10:08 ghost

Thank you bro. It's very detailed

llmadd avatar Aug 08 '23 10:08 llmadd

I was able to fix this. you need to look, for each chain type (stuff, refine, map_reduce & map_rerank) for the correct input vars for each prompt.

Check the attached file, there I described the issue in detail. And how figured out the issue looking at the Langchain source code for the original/default prompt templates for each Chain type.

bing_chain_types.md

Thank you for your resource. However I found that the approach to map_reduce as stated in your file is incorrect. My code is as follow:

map_template = """Make a summary for these documents. Keep the summary as accurate and concise as possible.\n\n{text}"""
MAP_CHAIN_PROMPT = PromptTemplate(template=map_template, input_variables=["text"])

combine_template = """Use the following summary to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. However try to guess the closest answer as you can. Keep the answer as concise as possible. 

{summaries}

Question: {question}

Return the answer in this format. Use only the capital letters for the answer.
{{ "Answer": "A" , "Explanation": "Explanation about the option"}}

"""
COMBINE_CHAIN_PROMPT = PromptTemplate(template=combine_template, input_variables=["summaries", "question"])

map_reduce_qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce",
    chain_type_kwargs={"map_prompt": MAP_CHAIN_PROMPT, "combine_prompt": COMBINE_CHAIN_PROMPT, "document_variable_name": "summaries"},
)

The error is as follow: TypeError: langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain() got multiple values for keyword argument 'document_variable_name'

I'm using langchain=0.0.264. Do you have any idea what might cause the issue?

mystvearn avatar Aug 19 '23 09:08 mystvearn

I was able to fix this. you need to look, for each chain type (stuff, refine, map_reduce & map_rerank) for the correct input vars for each prompt. Check the attached file, there I described the issue in detail. And how figured out the issue looking at the Langchain source code for the original/default prompt templates for each Chain type. bing_chain_types.md

Thank you for your resource. However I found that the approach to map_reduce as stated in your file is incorrect. My code is as follow:

map_template = """Make a summary for these documents. Keep the summary as accurate and concise as possible.\n\n{text}"""
MAP_CHAIN_PROMPT = PromptTemplate(template=map_template, input_variables=["text"])

combine_template = """Use the following summary to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. However try to guess the closest answer as you can. Keep the answer as concise as possible. 

{summaries}

Question: {question}

Return the answer in this format. Use only the capital letters for the answer.
{{ "Answer": "A" , "Explanation": "Explanation about the option"}}

"""
COMBINE_CHAIN_PROMPT = PromptTemplate(template=combine_template, input_variables=["summaries", "question"])

map_reduce_qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce",
    chain_type_kwargs={"map_prompt": MAP_CHAIN_PROMPT, "combine_prompt": COMBINE_CHAIN_PROMPT, "document_variable_name": "summaries"},
)

The error is as follow: TypeError: langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain() got multiple values for keyword argument 'document_variable_name'

I'm using langchain=0.0.264. Do you have any idea what might cause the issue?

In the map_reduce_chain you are trying to create, try it without passing the document_variable_name, you are using the default input vars so you do not need to pass them again. Hope that will fix your issue.

braun-viathan avatar Aug 22 '23 09:08 braun-viathan

langchain = 0.0.279 It works on my side.


    combine_template = "Write a summary of the following text:\n\n{summaries}"
    combine_prompt_template = PromptTemplate.from_template(template=combine_template)

    question_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
    {context}
    Question: {question}
    Helpful Answer:"""
    question_prompt_template = PromptTemplate.from_template(template=question_template)

    qa_chain = RetrievalQA.from_chain_type(
        llm=chat_models_openAI, 
        chain_type='map_reduce', 
        retriever=db.as_retriever(search_kwargs={'fetch_k': 4, 'k':2}, search_type='mmr'), 
        return_source_documents=True,
        chain_type_kwargs={"question_prompt": question_prompt_template, "combine_prompt": combine_prompt_template}
    )

BobLuo avatar Sep 06 '23 08:09 BobLuo

Just to put everything together with regards to map_reduce

from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from decouple import config


TEXT = ["Python is a versatile and widely used programming language known for its clean and readable syntax, which relies on indentation for code structure",
        "It is a general-purpose language suitable for web development, data analysis, AI, machine learning, and automation. Python offers an extensive standard library with modules covering a broad range of tasks, making it efficient for developers.",
        "It is cross-platform, running on Windows, macOS, Linux, and more, allowing for broad application compatibility."
        "Python has a large and active community that develops libraries, provides documentation, and offers support to newcomers.",
        "It has particularly gained popularity in data science and machine learning due to its ease of use and the availability of powerful libraries and frameworks."]

meta_data = [{"source": "document 1", "page": 1},
             {"source": "document 2", "page": 2},
             {"source": "document 3", "page": 3},
             {"source": "document 4", "page": 4}]

embedding_function = SentenceTransformerEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

vector_db = Chroma.from_texts(
    texts=TEXT,
    embedding=embedding_function,
    metadatas=meta_data
)

combine_template = "Write a summary of the following text:\n\n{summaries}"
combine_prompt_template = PromptTemplate.from_template(
    template=combine_template)

question_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
question_prompt_template = PromptTemplate.from_template(
    template=question_template)

# create chat model
llm = ChatOpenAI(openai_api_key=config("OPENAI_API_KEY"), temperature=0)

# create retriever chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    # mmr > for diversity in documents
    # Set fetch_k value to get the fetch_k most similar search. This is basically semantic search
    retriever=vector_db.as_retriever(
        search_kwargs={'fetch_k': 4, 'k': 3}, search_type='mmr'),
    return_source_documents=True,
    chain_type="map_reduce",
    chain_type_kwargs={"question_prompt": question_prompt_template,
                       "combine_prompt": combine_prompt_template}
)

# question
question = "What areas is Python mostly used"

# call QA chain
response = qa_chain({"query": question})

print(response)

print("============================================")
print("====================Result==================")
print("============================================")
print(response["result"])


print("============================================")
print("===============Source Documents============")
print("============================================")

print(response["source_documents"][0])

Princekrampah avatar Oct 10 '23 06:10 Princekrampah

Hi, @amirgamil,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue involves a problem with using the RetrievalQA chain with the chain_type of map_reduce for custom prompts. It seems that the error occurs when using a custom prompt, but the code works successfully with a different chain_type. There have been several comments from users providing potential solutions and workarounds, including code examples and suggestions for different versions of the library. One user also shared a detailed file describing the issue and how they resolved it.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

dosubot[bot] avatar Feb 07 '24 16:02 dosubot[bot]