Support repeated tool calls in a loop within `AssistantAgent`
Confirmation
- [x] I confirm that I am a maintainer and so can use this template. If I am not, I understand this issue will be closed and I will be asked to use a different template.
Issue body
Many models have been trained to call tools repeatedly until a reflection on the results. For example, Claude models have been trained to perform exactly this.
Idea:
We can support this in AssistantAgent by introducing an optional tool_call_loop parameter to the constructor. By default, tool_call_loop=False which indicates the tool calls won't get into a loop -- a single tool call at most.
User can set tool_call_loop=True which indicates the tool call will run in a loop until the model:
- Produces non-tool call response, or
- Produces a handoff tool call.
This is a start. As the next step, introduce a ToolCallConfig class which controls how a tool call loop should behave, with parameters for setting maximum number of iterations, preset response text, etc. -- we can consider merging reflect_on_tool_call into it as well.
Please discuss and suggest ideas.
Related: #6261 #5621
The idea sounds good to me 👍
For my use-case it would be nice to have a way to enforce a text response after a certain amount of loops.
In my case the model is gathering information from a vectorDB. The agent sometimes decides to query multiple times, if it does not get the right information. But I cannot do too many loops since the cost and time gets too high. Also since with every call the input tokens get quite a bit bigger from the tool call results of previous calls.
So this would force the model to formulate the best answer it can with the info it got after a few tries.
I want try to implement this!
As @philippHorn mentioned, this feature would be very useful when we want to query a database. However, I see that Running an Agent in a Loop already provides a way to do this.
I don't have a deep understanding of the system yet, but assuming there is a tool used to interact with the database, and for some reason (the desired data doesn't exist when we query), would the difference be that:
Running an Agent in a Loop would be considered multiple independent Agent runs or interactions, with each run potentially involving one or a limited number of tool calls. Whereas the tool_call_loop feature we hope to implement would be considered a single complete interaction or operation flow, where the Agent internally and autonomously repeats tool calls until specific termination conditions are met.
@zhengkezhou1 if you are new to the code base, I think you can get started by sketching out your design in this issue so we can get you to the right direction.
The goal for this feature is to make it much easier to use a single agent without thinking about team at all. The tool call loop provides a quick way to do the same thing as a round robin with a single agent and termination condition however, the implementation can be much more specific to AssistantAgent itself.
@ekzhu you mentioned that Claude models have been trained to perform exactly this. Do you have any relevant links? I couldn't find this type of information in the documentation. Also, I did a small test with Gemini.
My initial idea is as follows:
Instead of manual checking, we will add the result obtained from the tool call to the prompt. For example: "The result of the current tool call is: [specific result of the function call]. Is this the result we need?" After that, we will send this prompt to the model.
If the response from the model indicates that the result is not what we expect, we will loop the tool call until we get the desired result, while also ensuring that if we continuously fail for a certain period, we will stop calling and return directly."
Explanation based on OpenAI API:
-
Initial User Request: Start by sending the user's query to the OpenAI model using the
Chat Completionsendpoint. Themessagesparameter in request will contain the user's initial message. -
Model Initiates Tool Call: If the model determines that it needs to use a tool (based on the
functionsyou've defined in your request), the response will include atool_callsarray. Thefinish_reasonin the response will be"tool_calls". -
Execute the Tool: Application will then parse the
tool_callsarray, identify the function to be called (based on thefunction.name), and execute that function. extract the necessary arguments fromtool_calls[].function.arguments. -
Construct the Evaluation Prompt: Once the
resultfrom executing the tool, create a new message to be added to the conversation history (themessagesarray). This new message will have theroleset to"user"and thecontentwill be evaluation prompt. For example:{ "role": "user", "content": "The result of the current tool call is: `[function execution result]`. Is this the result we need? Please answer 'Yes' or 'No'." }Replace
[function execution result]with the actual output from yourget_weatherfunction. -
Send Evaluation Prompt to Model: Make another call to the
Chat Completionsendpoint, this time including the entire conversation history in themessagesarray: the original user query, the model's tool call request (as a message withrole="assistant"and thetool_callsarray), the tool's response (as a message withrole="tool"and thetool_call_idandcontentcontaining the function result), and new evaluation prompt. -
Get Model's Evaluation: The model's response to this evaluation prompt will indicate whether it believes the tool call result is satisfactory. examine the
contentof the model's response. -
Implement the Looping and Retry Mechanism:
- If the model's response to evaluation prompt indicates "No" (or a negative sentiment based on interpretation), initiate another tool call. This involves sending the updated
messagesarray (including the model's negative evaluation) back to theChat Completionsendpoint. The model might then generate anothertool_callsrequest. *repeat steps 3-6 until the model's evaluation is "Yes" (or positive) or hit the defined limits for retries or consecutive failures.
- If the model's response to evaluation prompt indicates "No" (or a negative sentiment based on interpretation), initiate another tool call. This involves sending the updated
-
Final Response Generation: Once the model indicates the tool call result is satisfactory, make one final call to the
Chat Completionsendpoint with the completemessagesarray. The model should now be able to generate the final response to the user's initial query, leveraging the validated tool call result.
@ekzhu you mentioned that Claude models have been trained to perform exactly this. Do you have any relevant links? I couldn't find this type of information in the documentation. Also, I did a small test with Gemini.
Sorry. I made this up. However their documentation indicates this is the recommended way. See Sequential Tools in the documentation.
Instead of manual checking, we will add the result obtained from the tool call to the prompt. For example: "The result of the current tool call is: [specific result of the function call]. Is this the result we need?" After that, we will send this prompt to the model.
If the response from the model indicates that the result is not what we expect, we will loop the tool call until we get the desired result, while also ensuring that if we continuously fail for a certain period, we will stop calling and return directly."
Can we instead have the model decide directly whether to continue with another tool call, using only the tool call response? This is recommended by the Anthropic doc. The OpenAI Assistant API also uses similar pattern, see Run Lifecycle.
Hi @ekzhu , I have analyzed the proposed solution for implementing looped tool calls.
To enable this functionality, we need to add the tool_call_loop parameter to the _process_model_result function signature https://github.com/microsoft/autogen/blob/085ff3dd7dcb07c99229c9b935e818fe7da11d61/python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py#L964-L980
cls._execute_tool_call will be called in a loop. After each call, the code checks if executed_calls_and_results contains information indicating a handoff, or if the model's FinishReasons is Stop. If either of these conditions is met, the loop will stop and exit.
https://github.com/microsoft/autogen/blob/085ff3dd7dcb07c99229c9b935e818fe7da11d61/python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py#L1026-L1039