autogen Support repeated tool calls in a loop within `AssistantAgent`

Confirmation

[x] I confirm that I am a maintainer and so can use this template. If I am not, I understand this issue will be closed and I will be asked to use a different template.

Issue body

Many models have been trained to call tools repeatedly until a reflection on the results. For example, Claude models have been trained to perform exactly this.

Idea:

We can support this in AssistantAgent by introducing an optional tool_call_loop parameter to the constructor. By default, tool_call_loop=False which indicates the tool calls won't get into a loop -- a single tool call at most.

User can set tool_call_loop=True which indicates the tool call will run in a loop until the model:

Produces non-tool call response, or
Produces a handoff tool call.

This is a start. As the next step, introduce a ToolCallConfig class which controls how a tool call loop should behave, with parameters for setting maximum number of iterations, preset response text, etc. -- we can consider merging reflect_on_tool_call into it as well.

Please discuss and suggest ideas.

Related: #6261 #5621

Apr 10 '25 19:04 ekzhu

The idea sounds good to me 👍

For my use-case it would be nice to have a way to enforce a text response after a certain amount of loops.

In my case the model is gathering information from a vectorDB. The agent sometimes decides to query multiple times, if it does not get the right information. But I cannot do too many loops since the cost and time gets too high. Also since with every call the input tokens get quite a bit bigger from the tool call results of previous calls.

So this would force the model to formulate the best answer it can with the info it got after a few tries.

Apr 11 '25 16:04 philippHorn

I want try to implement this!

As @philippHorn mentioned, this feature would be very useful when we want to query a database. However, I see that Running an Agent in a Loop already provides a way to do this.

I don't have a deep understanding of the system yet, but assuming there is a tool used to interact with the database, and for some reason (the desired data doesn't exist when we query), would the difference be that:

Running an Agent in a Loop would be considered multiple independent Agent runs or interactions, with each run potentially involving one or a limited number of tool calls. Whereas the tool_call_loop feature we hope to implement would be considered a single complete interaction or operation flow, where the Agent internally and autonomously repeats tool calls until specific termination conditions are met.

Apr 13 '25 09:04 zhengkezhou1

@zhengkezhou1 if you are new to the code base, I think you can get started by sketching out your design in this issue so we can get you to the right direction.

The goal for this feature is to make it much easier to use a single agent without thinking about team at all. The tool call loop provides a quick way to do the same thing as a round robin with a single agent and termination condition however, the implementation can be much more specific to AssistantAgent itself.

Apr 15 '25 05:04 ekzhu

@ekzhu you mentioned that Claude models have been trained to perform exactly this. Do you have any relevant links? I couldn't find this type of information in the documentation. Also, I did a small test with Gemini.

My initial idea is as follows:

Instead of manual checking, we will add the result obtained from the tool call to the prompt. For example: "The result of the current tool call is: [specific result of the function call]. Is this the result we need?" After that, we will send this prompt to the model.

If the response from the model indicates that the result is not what we expect, we will loop the tool call until we get the desired result, while also ensuring that if we continuously fail for a certain period, we will stop calling and return directly."

Explanation based on OpenAI API:

Initial User Request: Start by sending the user's query to the OpenAI model using the Chat Completions endpoint. The messages parameter in request will contain the user's initial message.
Model Initiates Tool Call: If the model determines that it needs to use a tool (based on the functions you've defined in your request), the response will include a tool_calls array. The finish_reason in the response will be "tool_calls".
Execute the Tool: Application will then parse the tool_calls array, identify the function to be called (based on the function.name), and execute that function. extract the necessary arguments from tool_calls[].function.arguments.
Construct the Evaluation Prompt: Once the result from executing the tool, create a new message to be added to the conversation history (the messages array). This new message will have the role set to "user" and the content will be evaluation prompt. For example:
```
{
  "role": "user",
  "content": "The result of the current tool call is: `[function execution result]`. Is this the result we need? Please answer 'Yes' or 'No'."
}
```
Replace [function execution result] with the actual output from your get_weather function.
Send Evaluation Prompt to Model: Make another call to the Chat Completions endpoint, this time including the entire conversation history in the messages array: the original user query, the model's tool call request (as a message with role="assistant" and the tool_calls array), the tool's response (as a message with role="tool" and the tool_call_id and content containing the function result), and new evaluation prompt.
Get Model's Evaluation: The model's response to this evaluation prompt will indicate whether it believes the tool call result is satisfactory. examine the content of the model's response.
Implement the Looping and Retry Mechanism:
- If the model's response to evaluation prompt indicates "No" (or a negative sentiment based on interpretation), initiate another tool call. This involves sending the updated messages array (including the model's negative evaluation) back to the Chat Completions endpoint. The model might then generate another tool_calls request. *repeat steps 3-6 until the model's evaluation is "Yes" (or positive) or hit the defined limits for retries or consecutive failures.
Final Response Generation: Once the model indicates the tool call result is satisfactory, make one final call to the Chat Completions endpoint with the complete messages array. The model should now be able to generate the final response to the user's initial query, leveraging the validated tool call result.

Apr 17 '25 16:04 zhengkezhou1

@ekzhu you mentioned that Claude models have been trained to perform exactly this. Do you have any relevant links? I couldn't find this type of information in the documentation. Also, I did a small test with Gemini.

Sorry. I made this up. However their documentation indicates this is the recommended way. See Sequential Tools in the documentation.

Apr 17 '25 16:04 ekzhu

Instead of manual checking, we will add the result obtained from the tool call to the prompt. For example: "The result of the current tool call is: [specific result of the function call]. Is this the result we need?" After that, we will send this prompt to the model.

If the response from the model indicates that the result is not what we expect, we will loop the tool call until we get the desired result, while also ensuring that if we continuously fail for a certain period, we will stop calling and return directly."

Can we instead have the model decide directly whether to continue with another tool call, using only the tool call response? This is recommended by the Anthropic doc. The OpenAI Assistant API also uses similar pattern, see Run Lifecycle.

Apr 17 '25 16:04 ekzhu

Hi @ekzhu , I have analyzed the proposed solution for implementing looped tool calls. To enable this functionality, we need to add the tool_call_loop parameter to the _process_model_result function signature https://github.com/microsoft/autogen/blob/085ff3dd7dcb07c99229c9b935e818fe7da11d61/python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py#L964-L980 cls._execute_tool_call will be called in a loop. After each call, the code checks if executed_calls_and_results contains information indicating a handoff, or if the model's FinishReasons is Stop. If either of these conditions is met, the loop will stop and exit. https://github.com/microsoft/autogen/blob/085ff3dd7dcb07c99229c9b935e818fe7da11d61/python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py#L1026-L1039

May 04 '25 10:05 zhengkezhou1