langgraph icon indicating copy to clipboard operation
langgraph copied to clipboard

Ollama tool calls not working via openai proxy only when using langgraph

Open StreetLamb opened this issue 1 year ago • 13 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangGraph/LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangGraph/LangChain rather than my code.
  • [X] I am sure this is better as an issue rather than a GitHub discussion, since this is a LangGraph bug and not a design question.

Example Code

from typing import Literal

from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.checkpoint import MemorySaver
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode
import asyncio


@tool
def search(query: str):
    """Call to surf the web."""
    if "sf" in query.lower() or "san francisco" in query.lower():
        return "It's 60 degrees and foggy."
    return "It's 90 degrees and sunny."


tools = [search]

tool_node = ToolNode(tools)

model = ChatOpenAI(
    model="llama3.1", base_url="http://localhost:11434/v1", temperature=0
).bind_tools(tools)


def should_continue(state: MessagesState) -> Literal["tools", END]:
    messages = state["messages"]
    last_message = messages[-1]
    if last_message.tool_calls:
        return "tools"
    return END


async def call_model(state: MessagesState, config):
    messages = state["messages"]
    response = await model.ainvoke(messages, config)
    return {"messages": [response]}


workflow = StateGraph(MessagesState)

workflow.add_node("agent", call_model)
workflow.add_node("tools", tool_node)

workflow.set_entry_point("agent")

workflow.add_conditional_edges(
    "agent",
    should_continue,
)

workflow.add_edge("tools", "agent")

checkpointer = MemorySaver()

app = workflow.compile(checkpointer=checkpointer)

async def test():
    async for event in app.astream_events(
        {"messages": [HumanMessage(content="what is the weather in sf")]},
        version="v1",
        config={"configurable": {"thread_id": 42}},
    ):
        print(event)


asyncio.run(test())

Error Message and Stack Trace (if applicable)

No response

Description

I want to invoke a tool-calling compatible Ollama model through ChatOpenAI proxy. However, using the code above, the model does not properly tool call:

{'event': 'on_chain_end', 'data': {'output': {'messages': [HumanMessage(content='what is the weather in sf', id='f7017ae4-b2d0-49e3-b939-69738686368b'), AIMessage(content='{"name": "search", "parameters": {"query": "sf weather"}}', response_metadata={'finish_reason': 'stop', 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama'}, id='run-6a214185-27ba-4505-9cc2-574f20d04909')]}}, 'run_id': 'b2aeba64-38b0-447a-b7de-eefff49e3555', 'name': 'LangGraph', 'tags': [], 'metadata': {'thread_id': 43}, 'parent_ids': []}

However, the behaviour is different when using just langchain:

import asyncio

import openai
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

@tool
def search(query: str):
    """Call to surf the web."""
    if "sf" in query.lower() or "san francisco" in query.lower():
        return "It's 60 degrees and foggy."
    return "It's 90 degrees and sunny."


model = ChatOpenAI(model="llama3.1", base_url="http://localhost:11434/v1", temperature=0)

model_with_tools = model.bind_tools([search])

async def test():
    prompt = ChatPromptTemplate.from_messages(
        [
            MessagesPlaceholder(variable_name="messages"),
        ]
    )
    chain = prompt | model_with_tools
    response = await chain.ainvoke(
        {"messages": [HumanMessage(content="What is the weather like in sf")]}
    )
    return response


response = asyncio.run(test())
print(response)

This way the model correctly utilise a tool call:

content='' additional_kwargs={'tool_calls': [{'id': 'call_xncx3ycn', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 148, 'total_tokens': 165}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None} id='run-63a9efd2-6619-448b-9a89-476f45cfb5c8-0' tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_xncx3ycn', 'type': 'tool_call'}] usage_metadata={'input_tokens': 148, 'output_tokens': 17, 'total_tokens': 165}

System Info

langchain==0.2.7 langchain-anthropic==0.1.20 langchain-cohere==0.1.5 langchain-community==0.2.7 langchain-core==0.2.21 langchain-google-genai==1.0.5 langchain-ollama==0.1.0 langchain-openai==0.1.17 langchain-qdrant==0.1.1 langchain-text-splitters==0.2.0 langchain-weaviate==0.0.1.post1

platform: mac silicon python version: Python 3.12.2

StreetLamb avatar Jul 27 '24 01:07 StreetLamb

Oh interesting. How many times did you run both versions? Is it reliably different in both contexts? And then you've confirmed the ollama versions are the same in both scenarios?

hinthornw avatar Jul 27 '24 01:07 hinthornw

Hi @hinthornw , I ran them multiple times (and once more just now just to be sure) and the behaviour is consistent. Tested both on ollama v0.3.0.

StreetLamb avatar Jul 27 '24 01:07 StreetLamb

It seems to be cause by astream_events method. Using langgraph with astream works:

...
async def test():
    async for event in app.astream(
        {"messages": [HumanMessage(content="what is the weather in sf")]},
        config={"configurable": {"thread_id": 42}},
    ):
        print(event)


asyncio.run(test())
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_rs3ykbgl', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 147, 'total_tokens': 164}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-3175464b-50ce-4fa7-afc7-73fe14cf92ee-0', tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_rs3ykbgl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 147, 'output_tokens': 17, 'total_tokens': 164})]}}
{'tools': {'messages': [ToolMessage(content="It's 60 degrees and foggy.", name='search', tool_call_id='call_rs3ykbgl')]}}
{'agent': {'messages': [AIMessage(content='Based on the tool call response, I can format an answer to your original question:\n\nThe current weather in San Francisco (SF) is 60 degrees with fog.', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 86, 'total_tokens': 120}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-0c048f94-63ae-45d1-9808-4083dd65ec0d-0', usage_metadata={'input_tokens': 86, 'output_tokens': 34, 'total_tokens': 120})]}}

StreetLamb avatar Jul 27 '24 02:07 StreetLamb

I encountered the same issue, how to get tool_calls and event when using astream_events? should set specific config? or it is not supported in langgraph?

aliyarly avatar Oct 23 '24 06:10 aliyarly

It seems to be cause by astream_events method. Using langgraph with astream works:

...
async def test():
    async for event in app.astream(
        {"messages": [HumanMessage(content="what is the weather in sf")]},
        config={"configurable": {"thread_id": 42}},
    ):
        print(event)


asyncio.run(test())
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_rs3ykbgl', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 147, 'total_tokens': 164}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-3175464b-50ce-4fa7-afc7-73fe14cf92ee-0', tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_rs3ykbgl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 147, 'output_tokens': 17, 'total_tokens': 164})]}}
{'tools': {'messages': [ToolMessage(content="It's 60 degrees and foggy.", name='search', tool_call_id='call_rs3ykbgl')]}}
{'agent': {'messages': [AIMessage(content='Based on the tool call response, I can format an answer to your original question:\n\nThe current weather in San Francisco (SF) is 60 degrees with fog.', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 86, 'total_tokens': 120}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-0c048f94-63ae-45d1-9808-4083dd65ec0d-0', usage_metadata={'input_tokens': 86, 'output_tokens': 34, 'total_tokens': 120})]}}

it's not work for me... I still cannot call functions...

{'messages': ['what is the weather in sf',
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_wurji20e', 'function': {'arguments': '{"query":"weather in sf"}', 'name': 'search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 209, 'total_tokens': 227, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-8834569e-3531-4c0e-a47b-bca59c600db1-0', tool_calls=[{'name': 'search', 'args': {'query': 'weather in sf'}, 'id': 'call_wurji20e', 'type': 'tool_call'}], usage_metadata={'input_tokens': 209, 'output_tokens': 18, 'total_tokens': 227, 'input_token_details': {}, 'output_token_details': {}})]}

Barry1915 avatar Oct 29 '24 07:10 Barry1915

Thanks, it works.

Barry1915 avatar Oct 29 '24 09:10 Barry1915

Has this been solved in LangGraph? I have the same issue. I tried using llm=ChatOllama(model="llama3.1") for simplicity but agent seems to not make the tool call: {'tools': {'messages': []}}.

My create_agent function:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

def create_agent(llm, tools, system_message: str):
    """Create an agent."""
    base_message = (
        "You are a helpful AI assistant, collaborating with other assistants."
        " If you are unable to fully answer, that's OK, another assistant with different tools"
        " will help where you left off. Execute what you can to make progress."
    )
    
    if tools:
        prompt = ChatPromptTemplate.from_messages([
            (
                "system",
                base_message + 
                " You have access to the following tools: {tool_names}.\n{system_message}",
            ),
            MessagesPlaceholder(variable_name="messages"),
        ])
        prompt = prompt.partial(system_message=system_message)
        prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
        return prompt | llm.bind_tools(tools)
    else:
        prompt = ChatPromptTemplate.from_messages([
            (
                "system",
                base_message + "\n{system_message}",
            ),
            MessagesPlaceholder(variable_name="messages"),
        ])
        prompt = prompt.partial(system_message=system_message)
        return prompt | llm

SergioRubio01 avatar Jan 23 '25 10:01 SergioRubio01

I think this may be related, I am seeing issues with astreams, and ChatOllama not being able to run tools. Using the LLM before binding tools works, but after calling .bind_tools() it just sends two responses and then silence.

        # this can be used with astreams, it doesnt have tools bound
        self.llm = ChatOllama(
            model="llama3.1",
            base_url="http://192.168.0.14:11434/",
            temperature=0.0,
            seed=42,
            stream=True
        )

        # this cant be used with astreams
        self.llm_with_tools = self.llm.bind_tools([tavily_search])


# In the chatbot function
        async for token in self.llm.astream(messages):  # doesn't work with a LLM that has  bind_tools()
            logger.info(token)

Output with LLM that has not bound tools

2025-02-02 21:53:30.840 | INFO     | glados.glados:chatbot:46 - messages: [HumanMessage(content='hi', additional_kwargs={}, response_metadata={}, id='d9065e8f-8db4-4ec8-a1a8-0423a332649c')]
2025-02-02 21:53:32.585 | INFO     | glados.glados:chatbot:50 - content='How' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.599 | INFO     | glados.glados:chatbot:50 - content="'s" additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.608 | INFO     | glados.glados:chatbot:50 - content=' it' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.617 | INFO     | glados.glados:chatbot:50 - content=' going' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.626 | INFO     | glados.glados:chatbot:50 - content='?' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.626 | INFO     | glados.glados:send_to_tts:66 - 🔊 Sending to TTS: How 's it going ?

The output when tools have been bound.

User: hi
2025-02-02 21:54:44.004 | INFO     | glados.glados:chatbot:46 - messages: [HumanMessage(content='hi', additional_kwargs={}, response_metadata={}, id='41ea7fbd-1bee-4d0f-a0c6-fa82ea8f4227')]
2025-02-02 21:54:44.417 | INFO     | glados.glados:chatbot:50 - content='' additional_kwargs={} response_metadata={} id='run-5cf7bb0b-1744-484d-82e3-6680a7755c32' tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'hi'}, 'id': '5e76b24e-e324-4635-b6c2-ceb08a150327', 'type': 'tool_call'}] tool_call_chunks=[{'name': 'tavily_search_results_json', 'args': '{"query": "hi"}', 'id': '5e76b24e-e324-4635-b6c2-ceb08a150327', 'index': None, 'type': 'tool_call_chunk'}]
User: 2025-02-02 21:54:44.425 | INFO     | glados.glados:chatbot:50 - content='' additional_kwargs={} response_metadata={'model': 'llama3.1', 'created_at': '2025-02-02T20:54:44.4251252Z', 'done': True, 'done_reason': 'stop', 'total_duration': 413467300, 'load_duration': 25397300, 'prompt_eval_count': 189, 'prompt_eval_duration': 86000000, 'eval_count': 21, 'eval_duration': 300000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)} id='run-5cf7bb0b-1744-484d-82e3-6680a7755c32' usage_metadata={'input_tokens': 189, 'output_tokens': 21, 'total_tokens': 210}

The entire class

import asyncio
from typing import Annotated

from langchain_community.tools import TavilySearchResults
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, create_react_agent, tools_condition
from loguru import logger
from typing_extensions import TypedDict

tavily_search = TavilySearchResults(max_results=2)


class State(TypedDict):
    messages: Annotated[list, add_messages]


class GlaDOS:

    def __init__(self):
        self.graph_builder = StateGraph(State)

        # this can be used with astreams, if it DOESNT have tools bound
        self.llm = ChatOllama(
            model="llama3.1",
            base_url="http://192.168.0.14:11434/",
            temperature=0.0,
            seed=42,
            stream=True
        ).bind_tools([tavily_search])  # remove this to get the LLM to work, but without tools :/

        tool_node = ToolNode([tavily_search])
        self.graph_builder.add_node("chatbot", self.chatbot)
        self.graph_builder.add_node("tools", tool_node)
        self.graph_builder.add_conditional_edges("chatbot", tools_condition)
        self.graph_builder.add_edge("tools", "chatbot")
        self.graph_builder.set_entry_point("chatbot")
        self.graph = self.graph_builder.compile()
        print(self.graph.get_graph().draw_mermaid())

    async def chatbot(self, state: State):
        """Asynchronous chatbot function that streams responses and sends partial updates to TTS."""
        messages = state["messages"]
        logger.info(f"messages: {messages}")
        accumulated_text = ""

        async for token in self.llm.astream(messages):  # doesn't work with a LLM that has  bind_tools()
            logger.info(token)
            if hasattr(token, "content"):
                token_text = token.content.strip()
                logger.debug(f"Streaming Token: {token_text}")
                accumulated_text += token_text + " "
                yield {"messages": [{"role": "assistant", "content": token_text}]}
                if len(accumulated_text) > 40 or "." in token_text or "!" in token_text or "?" in token_text:
                    self.send_to_tts(accumulated_text.strip())
                    accumulated_text = ""  # Reset buffer after sending

        if accumulated_text:
            self.send_to_tts(accumulated_text.strip())

    def send_to_tts(self, text: str):
        """Send final streamed response to TTS for speaking."""
        if text.strip():
            logger.info(f"🔊 Sending to TTS: {text}")

glados = GlaDOS()

async def stream_graph_updates(user_input: str):
    """ Asynchronously streams responses from LangGraph """
    async for event, metadata in glados.graph.astream(
            {"messages": [{"role": "user", "content": user_input}]},
            stream_mode="messages"
    ):
        logger.debug(f"Received event: {event}")

        # Handle AIMessageChunk directly instead of assuming it's a dict
        if hasattr(event, "content"):
            logger.debug(event.content)
        else:
            logger.warning(f"Unexpected event type: {type(event)}")


async def main():
    """ Main loop to interact with the chatbot """
    while True:
        try:
            user_input = input("User: ")
            if user_input.lower() in ["quit", "exit", "q"]:
                print("Goodbye!")
                break

            await stream_graph_updates(user_input)
        except Exception as e:
            logger.exception(f"Error: {e}")


# Run with asyncio event loop
if __name__ == '__main__':
    asyncio.run(main())

unixunion avatar Feb 02 '25 21:02 unixunion

I think this may be related, I am seeing issues with astreams, and ChatOllama not being able to run tools. Using the LLM before binding tools works, but after calling .bind_tools() it just sends two responses and then silence.

        # this can be used with astreams, it doesnt have tools bound
        self.llm = ChatOllama(
            model="llama3.1",
            base_url="http://192.168.0.14:11434/",
            temperature=0.0,
            seed=42,
            stream=True
        )

        # this cant be used with astreams
        self.llm_with_tools = self.llm.bind_tools([tavily_search])


# In the chatbot function
        async for token in self.llm.astream(messages):  # doesn't work with a LLM that has  bind_tools()
            logger.info(token)

Output with LLM that has not bound tools

2025-02-02 21:53:30.840 | INFO     | glados.glados:chatbot:46 - messages: [HumanMessage(content='hi', additional_kwargs={}, response_metadata={}, id='d9065e8f-8db4-4ec8-a1a8-0423a332649c')]
2025-02-02 21:53:32.585 | INFO     | glados.glados:chatbot:50 - content='How' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.599 | INFO     | glados.glados:chatbot:50 - content="'s" additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.608 | INFO     | glados.glados:chatbot:50 - content=' it' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.617 | INFO     | glados.glados:chatbot:50 - content=' going' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.626 | INFO     | glados.glados:chatbot:50 - content='?' additional_kwargs={} response_metadata={} id='run-0277b3e0-1ec9-43fe-88b6-0386951b6112'
2025-02-02 21:53:32.626 | INFO     | glados.glados:send_to_tts:66 - 🔊 Sending to TTS: How 's it going ?

The output when tools have been bound.

User: hi
2025-02-02 21:54:44.004 | INFO     | glados.glados:chatbot:46 - messages: [HumanMessage(content='hi', additional_kwargs={}, response_metadata={}, id='41ea7fbd-1bee-4d0f-a0c6-fa82ea8f4227')]
2025-02-02 21:54:44.417 | INFO     | glados.glados:chatbot:50 - content='' additional_kwargs={} response_metadata={} id='run-5cf7bb0b-1744-484d-82e3-6680a7755c32' tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'hi'}, 'id': '5e76b24e-e324-4635-b6c2-ceb08a150327', 'type': 'tool_call'}] tool_call_chunks=[{'name': 'tavily_search_results_json', 'args': '{"query": "hi"}', 'id': '5e76b24e-e324-4635-b6c2-ceb08a150327', 'index': None, 'type': 'tool_call_chunk'}]
User: 2025-02-02 21:54:44.425 | INFO     | glados.glados:chatbot:50 - content='' additional_kwargs={} response_metadata={'model': 'llama3.1', 'created_at': '2025-02-02T20:54:44.4251252Z', 'done': True, 'done_reason': 'stop', 'total_duration': 413467300, 'load_duration': 25397300, 'prompt_eval_count': 189, 'prompt_eval_duration': 86000000, 'eval_count': 21, 'eval_duration': 300000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)} id='run-5cf7bb0b-1744-484d-82e3-6680a7755c32' usage_metadata={'input_tokens': 189, 'output_tokens': 21, 'total_tokens': 210}

The entire class

import asyncio
from typing import Annotated

from langchain_community.tools import TavilySearchResults
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, create_react_agent, tools_condition
from loguru import logger
from typing_extensions import TypedDict

tavily_search = TavilySearchResults(max_results=2)


class State(TypedDict):
    messages: Annotated[list, add_messages]


class GlaDOS:

    def __init__(self):
        self.graph_builder = StateGraph(State)

        # this can be used with astreams, if it DOESNT have tools bound
        self.llm = ChatOllama(
            model="llama3.1",
            base_url="http://192.168.0.14:11434/",
            temperature=0.0,
            seed=42,
            stream=True
        ).bind_tools([tavily_search])  # remove this to get the LLM to work, but without tools :/

        tool_node = ToolNode([tavily_search])
        self.graph_builder.add_node("chatbot", self.chatbot)
        self.graph_builder.add_node("tools", tool_node)
        self.graph_builder.add_conditional_edges("chatbot", tools_condition)
        self.graph_builder.add_edge("tools", "chatbot")
        self.graph_builder.set_entry_point("chatbot")
        self.graph = self.graph_builder.compile()
        print(self.graph.get_graph().draw_mermaid())

    async def chatbot(self, state: State):
        """Asynchronous chatbot function that streams responses and sends partial updates to TTS."""
        messages = state["messages"]
        logger.info(f"messages: {messages}")
        accumulated_text = ""

        async for token in self.llm.astream(messages):  # doesn't work with a LLM that has  bind_tools()
            logger.info(token)
            if hasattr(token, "content"):
                token_text = token.content.strip()
                logger.debug(f"Streaming Token: {token_text}")
                accumulated_text += token_text + " "
                yield {"messages": [{"role": "assistant", "content": token_text}]}
                if len(accumulated_text) > 40 or "." in token_text or "!" in token_text or "?" in token_text:
                    self.send_to_tts(accumulated_text.strip())
                    accumulated_text = ""  # Reset buffer after sending

        if accumulated_text:
            self.send_to_tts(accumulated_text.strip())

    def send_to_tts(self, text: str):
        """Send final streamed response to TTS for speaking."""
        if text.strip():
            logger.info(f"🔊 Sending to TTS: {text}")

glados = GlaDOS()

async def stream_graph_updates(user_input: str):
    """ Asynchronously streams responses from LangGraph """
    async for event, metadata in glados.graph.astream(
            {"messages": [{"role": "user", "content": user_input}]},
            stream_mode="messages"
    ):
        logger.debug(f"Received event: {event}")

        # Handle AIMessageChunk directly instead of assuming it's a dict
        if hasattr(event, "content"):
            logger.debug(event.content)
        else:
            logger.warning(f"Unexpected event type: {type(event)}")


async def main():
    """ Main loop to interact with the chatbot """
    while True:
        try:
            user_input = input("User: ")
            if user_input.lower() in ["quit", "exit", "q"]:
                print("Goodbye!")
                break

            await stream_graph_updates(user_input)
        except Exception as e:
            logger.exception(f"Error: {e}")


# Run with asyncio event loop
if __name__ == '__main__':
    asyncio.run(main())

I read a paper acknowledging that open source models still lack the ability to call tools for most cases so I guess it's not a problem of this framework or anything related

SergioRubio01 avatar Feb 03 '25 22:02 SergioRubio01

I read a paper acknowledging that open source models still lack the ability to call tools for most cases so I guess it's not a problem of this framework or anything related

I dont think its a model issue, I'm using a function calling model, and it works when I don't stream responses just fine. I have also used the same model directly via other clients, such as OpenAI with streaming, and it works in those cases.

Its just not working via the langgraph / lanchain clients.

Since OP is noting the issue with astream_events, I just wanted to say its same for me with astream.

unixunion avatar Feb 06 '25 14:02 unixunion

I have tried this with other inference providers that offer OpenAI compatible endpoints. I have this issue with groq and also XAI - I think it s a problem with the langgraph implementation

jvsteiner avatar Mar 05 '25 12:03 jvsteiner

Is this problem solved? I encoutered the similar issue. When I use Claude, I can't get the tool-calling informations!

https://github.com/langchain-ai/langgraph/discussions/4651

tim-chow avatar May 12 '25 03:05 tim-chow

It seems to be cause by astream_events method. Using langgraph with astream works:

... async def test(): async for event in app.astream( {"messages": [HumanMessage(content="what is the weather in sf")]}, config={"configurable": {"thread_id": 42}}, ): print(event)

asyncio.run(test()) {'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_rs3ykbgl', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 147, 'total_tokens': 164}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-3175464b-50ce-4fa7-afc7-73fe14cf92ee-0', tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_rs3ykbgl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 147, 'output_tokens': 17, 'total_tokens': 164})]}} {'tools': {'messages': [ToolMessage(content="It's 60 degrees and foggy.", name='search', tool_call_id='call_rs3ykbgl')]}} {'agent': {'messages': [AIMessage(content='Based on the tool call response, I can format an answer to your original question:\n\nThe current weather in San Francisco (SF) is 60 degrees with fog.', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 86, 'total_tokens': 120}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-0c048f94-63ae-45d1-9808-4083dd65ec0d-0', usage_metadata={'input_tokens': 86, 'output_tokens': 34, 'total_tokens': 120})]}}

It doesn't work for me...

tim-chow avatar May 12 '25 05:05 tim-chow

I ran the repro with head of the tree langgraph and was not able to reproduce the issue.

The tool call seems to work properly. Please find the snippet below:

     {
      "type": "AIMessage",
      "content": "",
      "additional_kwargs": {
          "tool_calls": [
              {
                  "index": 0,
                  "id": "call_<id>",
                  "function": {
                      "arguments": "{\"query\":\"sf weather\"}",
                      "name": "search"
                  },
                  "type": "function"
              }
          ]
      },
      "response_metadata": {
          "finish_reason": "tool_calls",
          "model_name": "llama3.1",
          "system_fingerprint": "fp_ollama"
      },
      "id": "run--<run_id>",
      "tool_calls": [
          {
              "name": "search",
              "args": {
                  "query": "sf weather"
              },
              "id": "call_<id>",
              "type": "tool_call"
          }
      ]
  },
  {
      "type": "ToolMessage",
      "content": "It's 60 degrees and foggy.",
      "name": "search",
      "id": "<id>",
      "tool_call_id": "call_<id>"
  }

Did have to make couple of changes from the repro specified at the beginning of the issue:

  • Use InMemorySaver instead of MemorySaver
  • Had to give a dummy api key with ChatOpenAI

but those changes seem orthogonal to making tool calling to work.

@StreetLamb: Are you still facing the issue? CC: @sydney-runkle

kk-src avatar Sep 05 '25 23:09 kk-src