langchain
langchain copied to clipboard
Is it possible to digest every piece of document befero inferece?
Issue you'd like to raise.
In ingest process, a long document was splitted into multiple documents and embedding into vector DB. In inference process, the top K trunks were returned as context feeding to LLM.
In most of the cases this machanism works well, but what if I want to make an overall summary of the document? relying on the top K smilarity result won't be sufficient, or the query is relvevant to every trunks of document?
How can I make langchain digest every piece of document before inference?
Suggestion:
No response
🤖 AI-generated response by Steercode - chat with Langchain codebase Disclaimer: SteerCode Chat may provide inaccurate information about the Langchain codebase.
If you want to create an overall summary of the entire document and ensure that Langchain digests every piece of the document before inference, you can modify the retrieval and compression process. Instead of relying on the top K similar results, you can combine all the document trunks and process them together. You can achieve this by using the StuffDocumentsChain class. StuffDocumentsChain combines multiple documents into a single prompt and passes it to the LLM Chain for prediction. You can create an instance of this class and use it as the document combiner in your pipeline. Here's an example of how to create an instance of StuffDocumentsChain and use it in your pipeline:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import LLMChain
# Initialize your LLMChain (e.g., GPT-3)
llm_chain = LLMChain.from_pretrained("gpt-3")
# Create an instance of StuffDocumentsChain
stuff_documents_chain = StuffDocumentsChain.from_llm(llm_chain)
# Use the StuffDocumentsChain instance in your pipeline
# Replace the existing document combiner with the new instance
By using StuffDocumentsChain, you ensure that all the trunks of the document are combined and processed together, which should give you a more comprehensive summary of the entire document.
You can continue the conversation at https://steercode.com/
🤖 AI-generated response by Steercode - chat with Langchain codebase Disclaimer: SteerCode Chat may provide inaccurate information about the Langchain codebase.
If you want to create an overall summary of the entire document and ensure that Langchain digests every piece of the document before inference, you can modify the retrieval and compression process. Instead of relying on the top K similar results, you can combine all the document trunks and process them together. You can achieve this by using the StuffDocumentsChain class. StuffDocumentsChain combines multiple documents into a single prompt and passes it to the LLM Chain for prediction. You can create an instance of this class and use it as the document combiner in your pipeline. Here's an example of how to create an instance of StuffDocumentsChain and use it in your pipeline:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain from langchain.chains import LLMChain # Initialize your LLMChain (e.g., GPT-3) llm_chain = LLMChain.from_pretrained("gpt-3") # Create an instance of StuffDocumentsChain stuff_documents_chain = StuffDocumentsChain.from_llm(llm_chain) # Use the StuffDocumentsChain instance in your pipeline # Replace the existing document combiner with the new instance
By using StuffDocumentsChain, you ensure that all the trunks of the document are combined and processed together, which should give you a more comprehensive summary of the entire document.
You can continue the conversation at https://steercode.com/
Thanks @votrumar Streecode is super cool, good job.
But it dosen't solver my problem yet, I raised another issue: https://github.com/hwchase17/langchain/issues/4772 Hopefully this time I described the issue better.
Hi, @Jeru2023! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is about how to make langchain digest every piece of a document before inference. @votrumar suggested using the StuffDocumentsChain class to combine all the document trunks and process them together, but it seems that this solution didn't solve the problem and you raised another issue.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!