langchain
                                
                                 langchain copied to clipboard
                                
                                    langchain copied to clipboard
                            
                            
                            
                        How can `Document` metadata be passed into prompts?
Here is an example:
- I have created vector stores from several podcasts
- metadata = {"guest": guest_name}
- question = "which guests have talked about <topic>?"
Using VectorDBQA, this could be possible if {context} contained text + metadata
Another format for retrieving text with metadata could be:
TEXT: <what the guest said>
GUEST: <guest_name>
Or maybe even:
TEXT: <what the guest said>
METADATA: {"guest": guest_name}
This way when asking questions, I can ask things like "what did <guest_name> say about <topic>?"
I have a number of different uses cases where this would also be helpful. I considered just adding the metadata directly to the text before embedding, but that's not ideal.
Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:
document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})
if there is an id metadata in your doc, it will be injected correctly
Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the
document_prompt:document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}") qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})if there is an
idmetadata in your doc, it will be injected correctly
Wow that's cool, didn't know about that kwarg! Thanks, will try this 😃
This won't change the docs grabbed by the retriever right? For example if I have a guest (Greg) stored in the metadata and I ask "what did Greg say", the retriever won't take the guest into account when grabbing the source and use it to match on something like similarity.
No, that's just for the refinement of the context documents by the LLM part.
Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)
templates = [
    SystemMessagePromptTemplate.from_template(
        prompts.system_prompt_v1,
        input_variables=["context", "source", "page_number"],
    ),
    HumanMessagePromptTemplate.from_template(
        prompts.user_prompt,
        input_variables=["context", "source", "page_number"],
    ),
]
qa_template = ChatPromptTemplate.from_messages(templates)
memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=5000, memory_key="chat_history", return_messages=True
)
memory.input_key = "question"
memory.output_key = "answer"
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(
        k=5, search_type="mmr", fetch_k=20, lambda_mult=0.5
    ),
    memory=memory,
    return_source_documents=True,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": qa_template},
)
return conversation_chain
`
Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)
@joe-barhouch Did you solve this? I want to use metadata as an input_variable but it only seems to allow 'context', which is page_content.
@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.
I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore.
Then i can get the metadata I have created and pass it into the prompt.
At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents
@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.
Thanks for the quick reply. Very helpful, and I was reaching a similar conclusion.
for ConversationalRetrievalChain
document_combine_prompt = PromptTemplate(
     input_variables=["source","year", "page","page_content"],
     template= """source: {source}
     year: {year}
     page: {page} 
     page content: {page_content}"""
)
qa = ConversationalRetrievalChain.from_llm(
                         ...           ,
        combine_docs_chain_kwargs={
            "prompt": retrieval_qa_chain_prompt,
            "document_prompt": document_combine_prompt,
        },
        
)
@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template
What is difference between "prompt" and "document_prompt"?
@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template
Yes. you should know what are the metadata fields in the document before creating the document prompt.
What is difference between "prompt" and "document_prompt"?
document prompt is the Prompt template used to organize content in retrieved documents. This ends up in the main prompt as the 'context'
Hi, @batmanscode! I'm helping the LangChain team manage their backlog and am marking this issue as stale.
It looks like you opened this issue to discuss passing Document metadata into prompts when using VectorDBQA. There have been contributions from other users sharing similar use cases and suggesting potential solutions. However, the issue remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to LangChain!
@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.
I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling
similarity_searchor similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents
Hello, I am looking for similar use case. I am extracting some metadata using 'similarity_search'. Now I want to use this to another QA chain. Can you show me the code snippet you used?