langchain Summarize Chain Doesn't Always Return Summary When Using Refine Chain Type

When using the refine chain_type of the load_summarize_chain, I get some unique output on some longer documents, which might necessitate minor changes to the current prompt.

Return original summary.

The original summary remains appropriate.

No changes needed to the original summary.

The existing summary remains sufficient in capturing the key points discussed

No refinement needed, as the new context does not provide any additional information on the content of the discussion or its key takeaways.

Apr 06 '23 17:04 CMobley7

I am currently using the refine chain as well and I notice the same issue. It doesn't always provide summary with longer document. Attached is my verbose log from the refine chain.

Hoping for a resolution soon.

refine_chain_verbose_log.txt

Apr 13 '23 08:04 SingTeng

I've found the same problem with the map_reduce chain as well.

May 14 '23 03:05 TheDeterminator

I have same problem with the refine chain

The existing summary is already comprehensive and does not require any refinement based on the additional context provided

May 27 '23 02:05 jinhucheung

Same here. Related to #1460

I use refine to summarize long articles and I see that the summary is actually quite good in the intermediate steps but at some point the model decides to throw away half the summary so lots is missing in the final answer.

Not sure how to adress this.

Jun 27 '23 13:06 thiswillbeyourgithub

Hey guys, @CMobley7 @jinhucheung @TheDeterminator @SingTeng @thiswillbeyourgithub Any new related to that? Did you managed to find a way out of it?

Thanks

Jul 04 '23 18:07 LucasMalinowski

Anyone fixed this issue?

Jul 25 '23 15:07 rishabhstha

I can see due to verbosity that the map_reduce chain writes effectively the summary, but it returns either an empty string object, or the last paragraph at best. Tried this on a local llm with llamacpp. Again, I can see it attempts generating the summary and all, but I'm left with nothing.

Aug 29 '23 02:08 nahuel89p

Yes same here. So the issue is from the reduce prompt and not the map prompt right? I struggle to find the reduce prompt in the codebase, can you ?

Aug 29 '23 08:08 thiswillbeyourgithub

I ended up apparently fixing my problem.

I basically changed a few lines to go closer to this : https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db

Here's my patch:

 from langchain.chains import ConversationalRetrievalChain
+from langchain.chains import LLMChain
 from langchain.chains.qa_with_sources import load_qa_with_sources_chain
 from langchain.retrievers.merger_retriever import MergerRetriever
 from langchain.document_transformers import EmbeddingsRedundantFilter
 from langchain.retrievers.document_compressors import DocumentCompressorPipeline
 from langchain.retrievers import ContextualCompressionRetriever
+from langchain.prompts.prompt import PromptTemplate

 from utils.llm import load_llm, AnswerConversationBufferMemory
 from utils.file_loader import load_doc, load_embeddings, create_hyde_retriever, get_tkn_length, average_word_length, wpm
@@ -552,10 +554,22 @@ class DocToolsLLM:
                             base_compressor=pipeline, base_retriever=retriever
                         )

-                    chain = ConversationalRetrievalChain.from_llm(
-                            llm=self.llm,
-                            chain_type="map_reduce",
+                    _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
+
+                    Chat History:
+                    {chat_history}
+
+                    Follow Up Input: {question}
+
+                    Standalone question:"""
+                    CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
+                    question_generator = LLMChain(llm=self.llm, prompt=CONDENSE_QUESTION_PROMPT)
+                    doc_chain = load_qa_with_sources_chain(self.llm, chain_type="map_reduce")
+
+                    chain = ConversationalRetrievalChain(
                             retriever=retriever,
+                            question_generator=question_generator,
+                            combine_docs_chain=doc_chain,
                             return_source_documents=True,
                             return_generated_question=True,
                             verbose=self.llm_verbosity,
(END)

The repo is this one : https://github.com/thiswillbeyourgithub/DocToolsLLM

Sep 02 '23 16:09 thiswillbeyourgithub

Hi, @CMobley7,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported is related to the refine chain_type of the load_summarize_chain not always returning a summary for longer documents. There have been discussions and attempts to find a resolution, with some users sharing their findings and potential fixes. It seems that Thiswillbeyourgithub has shared a patch that apparently resolved the issue.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

Dec 02 '23 16:12 dosubot[bot]

langchain langchain copied to clipboard

Summarize Chain Doesn't Always Return Summary When Using Refine Chain Type

langchain
langchain copied to clipboard