langchain How can `Document` metadata be passed into prompts?

Here is an example:

I have created vector stores from several podcasts
metadata = {"guest": guest_name}
question = "which guests have talked about <topic>?"

Using VectorDBQA, this could be possible if {context} contained text + metadata

Feb 18 '23 11:02 batmanscode

Another format for retrieving text with metadata could be:

TEXT: <what the guest said>
GUEST: <guest_name>

Or maybe even:

TEXT: <what the guest said>
METADATA: {"guest": guest_name}

This way when asking questions, I can ask things like "what did <guest_name> say about <topic>?"

Feb 21 '23 10:02 batmanscode

I have a number of different uses cases where this would also be helpful. I considered just adding the metadata directly to the text before embedding, but that's not ideal.

Mar 06 '23 19:03 sbc-max

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:

document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})

if there is an id metadata in your doc, it will be injected correctly

Jul 03 '23 13:07 flash1293

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:
document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})
if there is an id metadata in your doc, it will be injected correctly

Wow that's cool, didn't know about that kwarg! Thanks, will try this 😃

Jul 03 '23 13:07 batmanscode

This won't change the docs grabbed by the retriever right? For example if I have a guest (Greg) stored in the metadata and I ask "what did Greg say", the retriever won't take the guest into account when grabbing the source and use it to match on something like similarity.

Jul 15 '23 20:07 connorjoleary

No, that's just for the refinement of the context documents by the LLM part.

Jul 17 '23 08:07 flash1293

Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

templates = [
    SystemMessagePromptTemplate.from_template(
        prompts.system_prompt_v1,
        input_variables=["context", "source", "page_number"],
    ),
    HumanMessagePromptTemplate.from_template(
        prompts.user_prompt,
        input_variables=["context", "source", "page_number"],
    ),
]
qa_template = ChatPromptTemplate.from_messages(templates)

memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=5000, memory_key="chat_history", return_messages=True
)
memory.input_key = "question"
memory.output_key = "answer"

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(
        k=5, search_type="mmr", fetch_k=20, lambda_mult=0.5
    ),
    memory=memory,
    return_source_documents=True,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": qa_template},
)

return conversation_chain

`

Aug 10 '23 16:08 joe-barhouch

Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

@joe-barhouch Did you solve this? I want to use metadata as an input_variable but it only seems to allow 'context', which is page_content.

Sep 11 '23 09:09 Robs-Git-Hub

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

Sep 11 '23 09:09 joe-barhouch

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

Thanks for the quick reply. Very helpful, and I was reaching a similar conclusion.

Sep 11 '23 10:09 Robs-Git-Hub

for ConversationalRetrievalChain

document_combine_prompt = PromptTemplate(
     input_variables=["source","year", "page","page_content"],
     template= """source: {source}
     year: {year}
     page: {page} 
     page content: {page_content}"""
)

qa = ConversationalRetrievalChain.from_llm(
                         ...           ,
        combine_docs_chain_kwargs={
            "prompt": retrieval_qa_chain_prompt,
            "document_prompt": document_combine_prompt,
        },
        
)

Sep 26 '23 12:09 theekshanamadumal

@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

Sep 26 '23 12:09 joe-barhouch

What is difference between "prompt" and "document_prompt"?

Oct 12 '23 10:10 AI-General

@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

Yes. you should know what are the metadata fields in the document before creating the document prompt.

Oct 12 '23 10:10 theekshanamadumal

What is difference between "prompt" and "document_prompt"?

document prompt is the Prompt template used to organize content in retrieved documents. This ends up in the main prompt as the 'context'

Oct 12 '23 11:10 theekshanamadumal

Hi, @batmanscode! I'm helping the LangChain team manage their backlog and am marking this issue as stale.

It looks like you opened this issue to discuss passing Document metadata into prompts when using VectorDBQA. There have been contributions from other users sharing similar use cases and suggesting potential solutions. However, the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to LangChain!

Feb 08 '24 16:02 dosubot[bot]

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

Hello, I am looking for similar use case. I am extracting some metadata using 'similarity_search'. Now I want to use this to another QA chain. Can you show me the code snippet you used?

May 10 '24 23:05 sgautam666

langchain langchain copied to clipboard

How can `Document` metadata be passed into prompts?

for ConversationalRetrievalChain

langchain
langchain copied to clipboard