langchain
langchain copied to clipboard
FinalStreamingStdOutCallbackHandler not working with ChatOpenAI LLM
System Info
Hi :)
I tested the new callback stream handler FinalStreamingStdOutCallbackHandler
and noticed an issue with it.
I copied the code from the documentation and made just one change - use ChatOpenAI
instead of OpenAI
Who can help?
@hwchase17
Information
- [X] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [X] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
llm = ChatOpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0)
here is my only change
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)
agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")
Expected behavior
The code above returns the response from the agent but does not stream it. In my project, I must use the ChatOpenAI
LLM, so I would appreciate it if someone could fix this issue, please.
Hey! Thanks for bringing this up. I wrote FinalStreamingStdOutCallbackHandler
- I'll look into it
The issue is that OpenAI and ChatOpenAI use slightly different tokenizers. The answer using OpenAI contains the tokens ["\nFinal", " Answer", ":"]
while ChatOpenAI contains ["Final", " Answer", ":"]
, i.e. the new line is joined to the previous token.
You can make it work by specifying the answer_prefix_tokens:
llm = ChatOpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=['Final', ' Answer', ':'])], temperature=0) here is my only change
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)
agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")
But I'll also submit a PR to ignore new lines & white spaces when detecting the answer prefix tokens.
Hi @UmerHA
Thank you for checking. It's working for this specific agent type: ZERO_SHOT_REACT_DESCRIPTION. However, in my case, I'm using CHAT_CONVERSATIONAL_REACT_DESCRIPTION, and it's not working because the final answer is slightly different:
How can I extract the final answer from this structure?
Here is my code to reproduce the issue:
from langchain.chat_models import ChatOpenAI from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI( streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=['\n"action_input"', ': '])], temperature=0, ) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) tools = load_tools(["wikipedia", "llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, verbose=True, memory=memory) agent.run(input="It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")
Thank you!
That depends on how the LLM you're using tokenizes the answer prefix.
With the following code snippet, you can determine that:
from langchain.callbacks.base import BaseCallbackHandler
class MyCallbackHandler(BaseCallbackHandler):
def on_llm_new_token(self, token, **kwargs) -> None:
# print every token on a new line
print(f"#{token}#")
llm = OpenAI(streaming=True, callbacks=[MyCallbackHandler()])
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)
agent.run("It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.")
When I am using ChatOpenAI, I can see the streaming result output on the terminal, but how do I get the streaming back in the code?
Hello @lironezra I am trying to do the same as you when I use CHAT_CONVERSATIONAL_REACT_DESCRIPTION, FinalStreamingStdOutCallbackHandler is not able to map the ANSWER_PREFIX_TOKENS. Did you were able to make this callback works with CHAT_CONVERSATIONAL_REACT_DESCRIPTION?