langchain icon indicating copy to clipboard operation
langchain copied to clipboard

FinalStreamingStdOutCallbackHandler not working with ChatOpenAI LLM

Open lironezra opened this issue 1 year ago • 4 comments

System Info

Hi :)

I tested the new callback stream handler FinalStreamingStdOutCallbackHandler and noticed an issue with it. I copied the code from the documentation and made just one change - use ChatOpenAI instead of OpenAI

Who can help?

@hwchase17

Information

  • [X] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [X] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [X] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

llm = ChatOpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0) here is my only change tools = load_tools(["wikipedia", "llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False) agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")

Expected behavior

The code above returns the response from the agent but does not stream it. In my project, I must use the ChatOpenAI LLM, so I would appreciate it if someone could fix this issue, please.

lironezra avatar May 30 '23 10:05 lironezra

Hey! Thanks for bringing this up. I wrote FinalStreamingStdOutCallbackHandler - I'll look into it

UmerHA avatar May 30 '23 18:05 UmerHA

The issue is that OpenAI and ChatOpenAI use slightly different tokenizers. The answer using OpenAI contains the tokens ["\nFinal", " Answer", ":"] while ChatOpenAI contains ["Final", " Answer", ":"], i.e. the new line is joined to the previous token.

You can make it work by specifying the answer_prefix_tokens:

llm = ChatOpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=['Final', ' Answer', ':'])], temperature=0) here is my only change
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)
agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")

But I'll also submit a PR to ignore new lines & white spaces when detecting the answer prefix tokens.

UmerHA avatar May 31 '23 12:05 UmerHA

Hi @UmerHA

Thank you for checking. It's working for this specific agent type: ZERO_SHOT_REACT_DESCRIPTION. However, in my case, I'm using CHAT_CONVERSATIONAL_REACT_DESCRIPTION, and it's not working because the final answer is slightly different:

image

How can I extract the final answer from this structure?

Here is my code to reproduce the issue:

from langchain.chat_models import ChatOpenAI from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI( streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=['\n"action_input"', ': '])], temperature=0, ) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) tools = load_tools(["wikipedia", "llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, verbose=True, memory=memory) agent.run(input="It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")

Thank you!

lironezra avatar May 31 '23 13:05 lironezra

That depends on how the LLM you're using tokenizes the answer prefix.

With the following code snippet, you can determine that:

from langchain.callbacks.base import BaseCallbackHandler

class MyCallbackHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token, **kwargs) -> None:
        # print every token on a new line
        print(f"#{token}#")

llm = OpenAI(streaming=True, callbacks=[MyCallbackHandler()])
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)
agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")

UmerHA avatar Jun 02 '23 00:06 UmerHA

When I am using ChatOpenAI, I can see the streaming result output on the terminal, but how do I get the streaming back in the code?

wailliai avatar Jul 25 '23 06:07 wailliai

Hello @lironezra I am trying to do the same as you when I use CHAT_CONVERSATIONAL_REACT_DESCRIPTION, FinalStreamingStdOutCallbackHandler is not able to map the ANSWER_PREFIX_TOKENS. Did you were able to make this callback works with CHAT_CONVERSATIONAL_REACT_DESCRIPTION?

jonra1993 avatar Jul 31 '23 22:07 jonra1993