autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Fix AssistantAgent Tool Call Behavior

Open husseinmozannar opened this issue 11 months ago • 4 comments

Attempt at fixing #4514

  • limit to 1 tool call iteration only by default

This now forces to return a different response that summarizes the tool calls, see what I did here

husseinmozannar avatar Dec 07 '24 09:12 husseinmozannar

I added unit tests.

I think now this PR is just for fixing the repeated tool calls, I will let the handoffs for another PR. I think this is crucial to merge first before the other stuff.

husseinmozannar avatar Dec 08 '24 01:12 husseinmozannar

PR on hold.

Ideally AssistantAgent only does 1 LLM call with tools (if any). Return type of message should depend on tool call and allow other agents to easily convert that message to a string. Other agents and teams should be updated after AssistantAgent is updated

husseinmozannar avatar Dec 08 '24 07:12 husseinmozannar

I have updated the code, so we limit just 1 tool call iteration always. I have updated the examples too. @husseinmozannar @victordibia please verify the behavior is working with other scenarios.

ekzhu avatar Dec 08 '24 18:12 ekzhu

Behavior looks fine in a few sample teams I tried (multiple AssistantAgents in a round robin with varying access to tools), and tried the video surfer.

husseinmozannar avatar Dec 09 '24 02:12 husseinmozannar

Hi, @husseinmozannar, I'm trying to understand the full intent with this change as all tools results are now returned in TextMessage (vs. only in ToolCallResultMessage) which makes it difficult to differentiate tool call results from LLM responses.

  • "Limit 1 tool call per each on_messages invocation, by default return the tool call result as response."

As per "by default", was this intended to be optional behavior? It was good to be able to easily differentiate tool call responses to the client from the client's response to the caller. Thanks.

jspv avatar Dec 18 '24 03:12 jspv

Hey!

There was a problem with the previous version of AssistantAgent.

GPT-4 when it decides to call a tool, only returns the tool call and no other response.

In previous version of AssistantAgent, it called the LLM as many as time as needed so that the final LLM response was not a tool call i.e. a string. Now we fix that issue as follows:

Referencing the API doc

  • If the model returns no tool call, then the response is immediately returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message.

  • When the model returns tool calls, they will be executed right away:

  1. When reflect_on_tool_use is False (default), the tool call results are returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message. tool_call_summary_format can be used to customize the tool call summary. We still yield the toolcall and toolcallresult prior to the final textmessage

  2. When reflect_on_tool_use is True, the another model inference is made using the tool calls and results, and the text response is returned as a :class:~autogen_agentchat.messages.TextMessage in :attr:~autogen_agentchat.base.Response.chat_message. We still yield the toolcall and toolcallresult prior to the final textmessage

Moreover the innermessages of the final textmessage will have the toolcall +results.

Does this make more sense now?

husseinmozannar avatar Dec 18 '24 03:12 husseinmozannar

Thanks, I understand the intent, the challenge for me is that both tool call results and normal responses are now returned as :class:~autogen_agentchat.messages.TextMessage with no differentiating characteristics, making it difficult to determine the difference between a textmessage from the agent and the additional tool call response (or summary) from the tool without tracking additional state (e.g. did I just receive a toolcallresult prior to this message, if so, then the next TextMessage is the tool response, not from the LLM).

I think It would be better to not mix the types. There is already toolcallresultmessage which was was previously the unambiguous way to identify tool results; now tool results are coming in two forms (toolcallresultmessage and another copy in a different format as textmessage) and it is now requires extra logic to try and determine if the textmessage is the tool results or from the LLM.

Can we create a clear message types for the new messages (e.g. toolcallresultsummarymessage) or some other disambiguating way in the message the determine the type of message?

jspv avatar Dec 18 '24 12:12 jspv

@husseinmozannar I think it's a valid point and we should create a new type of chat message for this.

It can be important for orchestration or termination condition. When inner messages are not emitted, we need to rely on typing to figure out what happened.

ekzhu avatar Dec 18 '24 16:12 ekzhu

@husseinmozannar, I'm think I'm finding more side effects of this change. Using a simple RoundRobin with two agents:

  • writer, writes papers and has access to a web search tool
  • editor, reviews paper and provides suggestions and will terminate the team when paper is approved.

With the tool results now being returned as a TextMessage, I'm seeing the speaker move from writer->editor immediately after receiving the TextMessage tool response; so rather than the writer receiving the tool reply and using it to write the paper, the editor takes over prematurely.

Before:

task->writer runs tool -> writer writes paper -> editor provides feedback -> writer <-> editor ... -> editor approves

Now: task-> writer runs tool -> editor provides feedback on tool results -> writer <-> editor ...

  • the writer never gets to act on the tool results.

jspv avatar Dec 18 '24 16:12 jspv

We observed this effect as well. One way to fix this is by setting the reflect_on_tool_use=True when you create the writer

Alternatively, set allow_repeated_speaker=True in selector group chat. What you saw is because the selector by default moves on from the same speaker

ekzhu avatar Dec 18 '24 16:12 ekzhu