langchain
langchain copied to clipboard
Summarize Chain Doesn't Always Return Summary When Using Refine Chain Type
When using the refine
chain_type of the load_summarize_chain
, I get some unique output on some longer documents, which might necessitate minor changes to the current prompt.
Return original summary.
The original summary remains appropriate.
No changes needed to the original summary.
The existing summary remains sufficient in capturing the key points discussed
No refinement needed, as the new context does not provide any additional information on the content of the discussion or its key takeaways.
I am currently using the refine chain as well and I notice the same issue. It doesn't always provide summary with longer document. Attached is my verbose log from the refine chain.
Hoping for a resolution soon.
I've found the same problem with the map_reduce chain as well.
I have same problem with the refine chain
The existing summary is already comprehensive and does not require any refinement based on the additional context provided
Same here. Related to #1460
I use refine to summarize long articles and I see that the summary is actually quite good in the intermediate steps but at some point the model decides to throw away half the summary so lots is missing in the final answer.
Not sure how to adress this.
Hey guys, @CMobley7 @jinhucheung @TheDeterminator @SingTeng @thiswillbeyourgithub Any new related to that? Did you managed to find a way out of it?
Thanks
Anyone fixed this issue?
I can see due to verbosity that the map_reduce chain writes effectively the summary, but it returns either an empty string object, or the last paragraph at best. Tried this on a local llm with llamacpp. Again, I can see it attempts generating the summary and all, but I'm left with nothing.
Yes same here. So the issue is from the reduce prompt and not the map prompt right? I struggle to find the reduce prompt in the codebase, can you ?
I ended up apparently fixing my problem.
I basically changed a few lines to go closer to this : https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db
Here's my patch:
from langchain.chains import ConversationalRetrievalChain
+from langchain.chains import LLMChain
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.retrievers.merger_retriever import MergerRetriever
from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers import ContextualCompressionRetriever
+from langchain.prompts.prompt import PromptTemplate
from utils.llm import load_llm, AnswerConversationBufferMemory
from utils.file_loader import load_doc, load_embeddings, create_hyde_retriever, get_tkn_length, average_word_length, wpm
@@ -552,10 +554,22 @@ class DocToolsLLM:
base_compressor=pipeline, base_retriever=retriever
)
- chain = ConversationalRetrievalChain.from_llm(
- llm=self.llm,
- chain_type="map_reduce",
+ _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
+
+ Chat History:
+ {chat_history}
+
+ Follow Up Input: {question}
+
+ Standalone question:"""
+ CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
+ question_generator = LLMChain(llm=self.llm, prompt=CONDENSE_QUESTION_PROMPT)
+ doc_chain = load_qa_with_sources_chain(self.llm, chain_type="map_reduce")
+
+ chain = ConversationalRetrievalChain(
retriever=retriever,
+ question_generator=question_generator,
+ combine_docs_chain=doc_chain,
return_source_documents=True,
return_generated_question=True,
verbose=self.llm_verbosity,
(END)
The repo is this one : https://github.com/thiswillbeyourgithub/DocToolsLLM
Hi, @CMobley7,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported is related to the refine
chain_type of the load_summarize_chain
not always returning a summary for longer documents. There have been discussions and attempts to find a resolution, with some users sharing their findings and potential fixes. It seems that Thiswillbeyourgithub has shared a patch that apparently resolved the issue.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your understanding and cooperation.