langchain icon indicating copy to clipboard operation
langchain copied to clipboard

How can `Document` metadata be passed into prompts?

Open batmanscode opened this issue 2 years ago • 2 comments

Here is an example:

  • I have created vector stores from several podcasts
  • metadata = {"guest": guest_name}
  • question = "which guests have talked about <topic>?"

Using VectorDBQA, this could be possible if {context} contained text + metadata

batmanscode avatar Feb 18 '23 11:02 batmanscode

Another format for retrieving text with metadata could be:

TEXT: <what the guest said>
GUEST: <guest_name>

Or maybe even:

TEXT: <what the guest said>
METADATA: {"guest": guest_name}

This way when asking questions, I can ask things like "what did <guest_name> say about <topic>?"

batmanscode avatar Feb 21 '23 10:02 batmanscode

I have a number of different uses cases where this would also be helpful. I considered just adding the metadata directly to the text before embedding, but that's not ideal.

sbc-max avatar Mar 06 '23 19:03 sbc-max

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:

document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})

if there is an id metadata in your doc, it will be injected correctly

flash1293 avatar Jul 03 '23 13:07 flash1293

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:

document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})

if there is an id metadata in your doc, it will be injected correctly

Wow that's cool, didn't know about that kwarg! Thanks, will try this 😃

batmanscode avatar Jul 03 '23 13:07 batmanscode

This won't change the docs grabbed by the retriever right? For example if I have a guest (Greg) stored in the metadata and I ask "what did Greg say", the retriever won't take the guest into account when grabbing the source and use it to match on something like similarity.

connorjoleary avatar Jul 15 '23 20:07 connorjoleary

No, that's just for the refinement of the context documents by the LLM part.

flash1293 avatar Jul 17 '23 08:07 flash1293

Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

templates = [
    SystemMessagePromptTemplate.from_template(
        prompts.system_prompt_v1,
        input_variables=["context", "source", "page_number"],
    ),
    HumanMessagePromptTemplate.from_template(
        prompts.user_prompt,
        input_variables=["context", "source", "page_number"],
    ),
]
qa_template = ChatPromptTemplate.from_messages(templates)

memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=5000, memory_key="chat_history", return_messages=True
)
memory.input_key = "question"
memory.output_key = "answer"

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(
        k=5, search_type="mmr", fetch_k=20, lambda_mult=0.5
    ),
    memory=memory,
    return_source_documents=True,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": qa_template},
)

return conversation_chain

`

joe-barhouch avatar Aug 10 '23 16:08 joe-barhouch

Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

@joe-barhouch Did you solve this? I want to use metadata as an input_variable but it only seems to allow 'context', which is page_content.

Robs-Git-Hub avatar Sep 11 '23 09:09 Robs-Git-Hub

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

joe-barhouch avatar Sep 11 '23 09:09 joe-barhouch

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

Thanks for the quick reply. Very helpful, and I was reaching a similar conclusion.

Robs-Git-Hub avatar Sep 11 '23 10:09 Robs-Git-Hub

for ConversationalRetrievalChain

document_combine_prompt = PromptTemplate(
     input_variables=["source","year", "page","page_content"],
     template= """source: {source}
     year: {year}
     page: {page} 
     page content: {page_content}"""
)
qa = ConversationalRetrievalChain.from_llm(
                         ...           ,
        combine_docs_chain_kwargs={
            "prompt": retrieval_qa_chain_prompt,
            "document_prompt": document_combine_prompt,
        },
        
)

theekshanamadumal avatar Sep 26 '23 12:09 theekshanamadumal

@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

joe-barhouch avatar Sep 26 '23 12:09 joe-barhouch

What is difference between "prompt" and "document_prompt"?

AI-General avatar Oct 12 '23 10:10 AI-General

@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

Yes. you should know what are the metadata fields in the document before creating the document prompt.

theekshanamadumal avatar Oct 12 '23 10:10 theekshanamadumal

What is difference between "prompt" and "document_prompt"?

document prompt is the Prompt template used to organize content in retrieved documents. This ends up in the main prompt as the 'context'

theekshanamadumal avatar Oct 12 '23 11:10 theekshanamadumal

Hi, @batmanscode! I'm helping the LangChain team manage their backlog and am marking this issue as stale.

It looks like you opened this issue to discuss passing Document metadata into prompts when using VectorDBQA. There have been contributions from other users sharing similar use cases and suggesting potential solutions. However, the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to LangChain!

dosubot[bot] avatar Feb 08 '24 16:02 dosubot[bot]

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

Hello, I am looking for similar use case. I am extracting some metadata using 'similarity_search'. Now I want to use this to another QA chain. Can you show me the code snippet you used?

sgautam666 avatar May 10 '24 23:05 sgautam666