autogen [Issue]: Unable to enable tool calling when using a custom model

Describe the issue

I am trying to combine the following two notebooks into one:

In my simple and naive script to test the concept, I created two agent instances assistant as AssistantAgent() and user_proxy as UserProxyAgent(). The challenge is how to initialize and register assistant when I need to inform both of assistant and user_proxy of the provided tools as functions. However, it will raise error in running either assistant.register_model_client() or user_proxy.initiate_chat(). I don't know if the problem is my script or if there is a bug. I am grateful if you can help me on this.

Steps to reproduce

Step 1: Confirm the custom model's local path

Confirm the model Mistral-7B-OpenOrca exists in the file path /dev/Open-Orca/Mistral-7B-OpenOrca locally

Step 2: Create JSON file named as `OAI_CONFIG_LIST.json` under the directory `/dev/` with the following content

[
    {
        "model": "gpt-4",
	    "api_key": "<your OpenAI API key here>"
    },
    {
        "model": "/dev/Open-Orca/Mistral-7B-OpenOrca",
        "model_client_cls": "CustomModelClient",
        "params": {
            "max_length": 1000
        }
    }
]

Step 3: Run following Python script

import math
from types import SimpleNamespace, Optional, Type

import autogen

from autogen import AssistantAgent, UserProxyAgent
from transformers import AutoTokenizer, GenerationConfig, AutoModelForCausalLM

from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool


class CustomModelClient:
    def __init__(self, config, **kwargs):        
        self.device = config.get("device", "cpu")
        
        self.model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = config["model"])
        self.model_name = config["model"]
        
        self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = config["model"], use_fast = False)
        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

        # params are set by the user and consumed by the user since they are providing a custom model
        # so anything can be done here
        gen_config_params = config.get("params", {})
        self.max_length = gen_config_params.get("max_length", 256)

    def create(self, params):
        if params.get("stream", False) and "messages" in params:
            raise NotImplementedError("Local models do not support streaming.")
        else:
            num_of_responses = params.get("n", 1)

            # can create my own data response class
            # here using SimpleNamespace for simplicity
            # as long as it adheres to the ClientResponseProtocol

            response = SimpleNamespace()

            inputs = self.tokenizer.apply_chat_template(
                conversation = params["messages"], 
                return_tensors="pt", 
                add_generation_prompt=True
            ).to(self.device)
            inputs_length = inputs.shape[-1]

            # add inputs_length to max_length
            max_length = self.max_length + inputs_length
            generation_config = GenerationConfig(
                max_length = max_length,
                eos_token_id = self.tokenizer.eos_token_id,
                pad_token_id = self.tokenizer.pad_token_id,
            )

            response.choices = []
            response.model = self.model_name

            for _ in range(num_of_responses):
                outputs = self.model.generate(
                    inputs = inputs, 
                    generation_config=generation_config
                    )
                # Decode only the newly generated text, excluding the prompt
                text = self.tokenizer.decode(token_ids = outputs[0, inputs_length:])
                choice = SimpleNamespace()
                choice.message = SimpleNamespace()
                choice.message.content = text
                choice.message.function_call = None
                response.choices.append(choice)

            return response

    def message_retrieval(self, response):
        """Retrieve the messages from the response."""
        choices = response.choices
        return [choice.message.content for choice in choices]

    def cost(self, response) -> float:
        """Calculate the cost of the response."""
        response.cost = 0
        return 0

    @staticmethod
    def get_usage(response):
        # returns a dict of prompt_tokens, completion_tokens, total_tokens, cost, model
        # if usage needs to be tracked, else None
        return {}


class CustomToolInput(BaseModel):
    income: float = Field()


class CustomTool(BaseTool):
    name = "tax_calculator"
    description = "Use this tool when you need to calculate the tax using the income"
    args_schema: Type[BaseModel] = CustomToolInput

    def _run(self, fw: float):
        return float(income) * math.pi / 100


# Define a function to generate llm_config from a LangChain tool
def generate_llm_config(tool):
    # Define the function schema based on the tool's args_schema
    function_schema = {
        "name": tool.name.lower().replace(" ", "_"),
        "description": tool.description,
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    }

    if tool.args is not None:
        function_schema["parameters"]["properties"] = tool.args

    return function_schema


custom_tool = CustomTool()

config_list_custom = autogen.config_list_from_json(
    env_or_file = "OAI_CONFIG_LIST.json", 
    file_location = "/dev/", 
    filter_dict = {"model_client_cls": ["CustomModelClient"]},
)

user_proxy = UserProxyAgent(
    name = "user_proxy", 
    is_termination_msg = lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    human_input_mode = "NEVER",
    max_consecutive_auto_reply = 2, 
    code_execution_config = {
        "work_dir": "coding",
        "use_docker": False,  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
        "timeout": 600, 
        "last_n_messages": 1
    },
)

user_proxy.register_function(
    function_map={
        custom_tool.name: custom_tool._run
    }
)

llm_config = {
    "functions": [generate_llm_config(custom_tool)],
    "config_list": config_list_custom, 
    "timeout": 120
}

assistant = AssistantAgent(
    name = "assistant", 
    llm_config = llm_config, 
    system_message = "For coding tasks, only use the functions you have been provided with. Reply TERMINATE when the task is done."
)

assistant.register_model_client(model_client_cls = CustomModelClient)

with autogen.Cache.disk():
    user_proxy.initiate_chat(assistant, message="when the income is 100, calculate the tax")

Screenshots and logs

I am getting error after running assistant.register_model_client(model_client_cls = CustomModelClient):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_1734/1825630986.py in <cell line: 1>()
----> 1 assistant.register_model_client(model_client_cls = CustomModelClient)

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in register_model_client(self, model_client_cls, **kwargs)
   2296             **kwargs: The kwargs for the custom client class to be initialized with
   2297         """
-> 2298         self.client.register_model_client(model_client_cls, **kwargs)
   2299 
   2300     def register_hook(self, hookable_method: Callable, hook: Callable):

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/oai/client.py in register_model_client(self, model_client_cls, **kwargs)
    431             )
    432         else:
--> 433             raise ValueError(
    434                 f'Model client "{model_client_cls.__name__}" is being registered but was not found in the config_list. '
    435                 f'Please make sure to include an entry in the config_list with "model_client_cls": "{model_client_cls.__name__}"'

ValueError: Model client "CustomModelClient" is being registered but was not found in the config_list. Please make sure to include an entry in the config_list with "model_client_cls": "CustomModelClient"

and I am getting error after running user_proxy.initiate_chat(assistant, message="when the income is 100, calculate the tax"):

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_1734/532364807.py in <cell line: 1>()
      1 with autogen.Cache.disk():
----> 2     user_proxy.initiate_chat(assistant, message="when the income is 100, calculate the tax")

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in initiate_chat(self, recipient, clear_history, silent, cache, **context)
    791             agent.client_cache = cache
    792         self._prepare_chat(recipient, clear_history)
--> 793         self.send(self.generate_init_message(**context), recipient, silent=silent)
    794         summary = self._summarize_chat(
    795             context.get("summary_method"),

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in send(self, message, recipient, request_reply, silent)
    502         valid = self._append_oai_message(message, "assistant", recipient)
    503         if valid:
--> 504             recipient.receive(message, self, request_reply, silent)
    505         else:
    506             raise ValueError(

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in receive(self, message, sender, request_reply, silent)
    677         if request_reply is False or request_reply is None and self.reply_at_receive[sender] is False:
    678             return
--> 679         reply = self.generate_reply(messages=self.chat_messages[sender], sender=sender)
    680         if reply is not None:
    681             self.send(reply, sender, silent=silent)

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in generate_reply(self, messages, sender, **kwargs)
   1635                 continue
   1636             if self._match_trigger(reply_func_tuple["trigger"], sender):
-> 1637                 final, reply = reply_func(self, messages=messages, sender=sender, config=reply_func_tuple["config"])
   1638                 if final:
   1639                     return reply

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in generate_oai_reply(self, messages, sender, config)
   1053         if messages is None:
   1054             messages = self._oai_messages[sender]
-> 1055         extracted_response = self._generate_oai_reply_from_client(
   1056             client, self._oai_system_message + messages, self.client_cache
   1057         )

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/agentchat/conversable_agent.py in _generate_oai_reply_from_client(self, llm_client, messages, cache)
   1072 
   1073         # TODO: #1143 handle token limit exceeded error
-> 1074         response = llm_client.create(
   1075             context=messages[-1].pop("context", None),
   1076             messages=all_messages,

/opt/software/Miniconda/lib/python3.8/site-packages/autogen/oai/client.py in create(self, **config)
    526         ]
    527         if non_activated:
--> 528             raise RuntimeError(
    529                 f"Model client(s) {non_activated} are not activated. Please register the custom model clients using `register_model_client` or filter them out form the config list."
    530             )

RuntimeError: Model client(s) ['CustomModelClient'] are not activated. Please register the custom model clients using `register_model_client` or filter them out form the config list.

Additional Information

AutoGen Version: 0.2.13
Operation System: Linux
Python Version: 3.8.11

If I do what is shown below instead, the script will run and complete. However, the custom model is obviously NOT aware of the tool provided based on the output.

assistant = AssistantAgent(
    name = "assistant", 
    llm_config = {"config_list": config_list_custom}, 
    system_message = "For coding tasks, only use the functions you have been provided with. Reply TERMINATE when the task is done."
)

assistant.register_model_client(model_client_cls = CustomModelClient)

with autogen.Cache.disk():
    user_proxy.initiate_chat(assistant, message="when the income is 100, calculate the tax")

Feb 20 '24 18:02 woodswift

Can you confirm if Mistral-7B-OpenOrca supports function call format in its prompt template? To enable function all, the model itself needs to support/fine-tuned with function call format as well.

Feb 20 '24 19:02 LittleLittleCloud

I can not repro, the above code works for me. I would suggest checking that the correct OAI_CONFIG_FILE is being picked up, because the message ValueError: Model client "CustomModelClient" is being registered but was not found in the config_list. Please make sure to include an entry in the config_list with "model_client_cls": "CustomModelClient" implies that

Feb 20 '24 21:02 olgavrou

Can you confirm if Mistral-7B-OpenOrca supports function call format in its prompt template? To enable function all, the model itself needs to support/fine-tuned with function call format as well.

Yeah. That's really a good call-out! I missed that.

Feb 21 '24 21:02 woodswift

I finally got myself unblocked by using the class CustomModelClientWithArguments. However, tool calling was not successful, and my best guess was that Mistral-7B-OpenOrca was not finetuned for tool calling task.

Here is my solution:

import math
from types import SimpleNamespace, Optional, Type

import autogen

from autogen import AssistantAgent, UserProxyAgent
from transformers import AutoTokenizer, GenerationConfig, AutoModelForCausalLM

from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool


class CustomModelClient:
    def __init__(self, config, **kwargs):        
        self.device = config.get("device", "cpu")
        
        self.model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = config["model"])
        self.model_name = config["model"]
        
        self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = config["model"], use_fast = False)
        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

        # params are set by the user and consumed by the user since they are providing a custom model
        # so anything can be done here
        gen_config_params = config.get("params", {})
        self.max_length = gen_config_params.get("max_length", 256)

    def create(self, params):
        if params.get("stream", False) and "messages" in params:
            raise NotImplementedError("Local models do not support streaming.")
        else:
            num_of_responses = params.get("n", 1)

            # can create my own data response class
            # here using SimpleNamespace for simplicity
            # as long as it adheres to the ClientResponseProtocol

            response = SimpleNamespace()

            inputs = self.tokenizer.apply_chat_template(
                conversation = params["messages"], 
                return_tensors="pt", 
                add_generation_prompt=True
            ).to(self.device)
            inputs_length = inputs.shape[-1]

            # add inputs_length to max_length
            max_length = self.max_length + inputs_length
            generation_config = GenerationConfig(
                max_length = max_length,
                eos_token_id = self.tokenizer.eos_token_id,
                pad_token_id = self.tokenizer.pad_token_id,
            )

            response.choices = []
            response.model = self.model_name

            for _ in range(num_of_responses):
                outputs = self.model.generate(
                    inputs = inputs, 
                    generation_config=generation_config
                    )
                # Decode only the newly generated text, excluding the prompt
                text = self.tokenizer.decode(token_ids = outputs[0, inputs_length:])
                choice = SimpleNamespace()
                choice.message = SimpleNamespace()
                choice.message.content = text
                choice.message.function_call = None
                response.choices.append(choice)

            return response

    def message_retrieval(self, response):
        """Retrieve the messages from the response."""
        choices = response.choices
        return [choice.message.content for choice in choices]

    def cost(self, response) -> float:
        """Calculate the cost of the response."""
        response.cost = 0
        return 0

    @staticmethod
    def get_usage(response):
        # returns a dict of prompt_tokens, completion_tokens, total_tokens, cost, model
        # if usage needs to be tracked, else None
        return {}


class CustomModelClientWithArguments(CustomModelClient):
    def __init__(self, config, loaded_model, tokenizer, **kwargs):
        logger.info(f"CustomModelClientWithArguments config: {config}")

        self.device = config.get("device", "cpu")

        self.model = loaded_model
        self.model_name = config["model"]

        self.tokenizer = tokenizer
        self.tokenizer.pad_token_id = tokenizer.eos_token_id

        gen_config_params = config.get("params", {})
        self.max_length = gen_config_params.get("max_length", 256)


class CustomToolInput(BaseModel):
    income: float = Field()


class CustomTool(BaseTool):
    name = "tax_calculator"
    description = "Use this tool when you need to calculate the tax using the income"
    args_schema: Type[BaseModel] = CustomToolInput

    def _run(self, income: float):
        return float(income) * math.pi / 100


# Define a function to generate llm_config from a LangChain tool
def generate_llm_config(tool):
    # Define the function schema based on the tool's args_schema
    function_schema = {
        "name": tool.name.lower().replace(" ", "_"),
        "description": tool.description,
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    }

    if tool.args is not None:
        function_schema["parameters"]["properties"] = tool.args

    return function_schema


custom_tool = CustomTool()

config_list_custom = autogen.config_list_from_json(
    env_or_file = "OAI_CONFIG_LIST.json", 
    file_location = "/dev/", 
    filter_dict = {"model_client_cls": ["CustomModelClientWithArguments"]},
)

config = config_list_custom[0]

loaded_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = config["model"])

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = config["model"], use_fast = False)

user_proxy = UserProxyAgent(
    name = "user_proxy", 
    is_termination_msg = lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    human_input_mode = "NEVER",
    max_consecutive_auto_reply = 2, 
    code_execution_config = {
        "work_dir": "coding",
        "use_docker": False,  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
        "timeout": 600, 
        "last_n_messages": 1
    },
)

user_proxy.register_function(
    function_map={
        custom_tool.name: custom_tool._run
    }
)

llm_config = {
    "functions": [generate_llm_config(custom_tool)],
    "config_list": config_list_custom, 
    "timeout": 120
}

assistant = AssistantAgent(
    name = "assistant", 
    llm_config = llm_config, 
    system_message = "For coding tasks, only use the functions you have been provided with. Reply TERMINATE when the task is done."
)

assistant.register_model_client(
    model_client_cls = CustomModelClientWithArguments, 
    loaded_model = loaded_model,
    tokenizer = tokenizer
)

with autogen.Cache.disk():
    user_proxy.initiate_chat(assistant, message="when the income is 100, calculate the tax")

Feb 21 '24 21:02 woodswift

I posted this issue to our Discord channel to see if there are some there that can help. https://discord.com/channels/1153072414184452236/1201369716057440287

Feb 21 '24 22:02 ekzhu

I posted this issue to our Discord channel to see if there are some there that can help. https://discord.com/channels/1153072414184452236/1201369716057440287

Thank you! But I am seeing nothing after clicking the link. I am new to Discord, do I miss anything?

Feb 22 '24 03:02 woodswift

you might need a fine-tuned model. Trelis on huggingface has a couple,, but there is a dataset that you can use if you want to train your own. And for the discord autogen server I think you have to look for the channel #alt-models

Feb 22 '24 11:02 gyasis

@woodswift recently mistral model have started to support tool call. Have you checked? https://docs.mistral.ai/api/#operation/createChatCompletion

Mar 13 '24 06:03 ekzhu

@woodswift recently mistral model have started to support tool call. Have you checked? https://docs.mistral.ai/api/#operation/createChatCompletion

Oh, thank you for sharing! I have not tried it yet, so will do shortly :)

Mar 15 '24 02:03 woodswift

@woodswift, you should be able to do it through LiteLLM+Ollama (note: Ollama released a new version, 0.1.29, you'll need that). You can also test through together.ai who have Mistral and Mixtral models that support function calling).

Oh, if you are using LiteLLM + Ollama, please be sure to use "ollama_chat/" rather than "ollama/".

Mar 15 '24 09:03 marklysze

@woodswift, you should be able to do it through LiteLLM+Ollama (note: Ollama released a new version, 0.1.29, you'll need that). You can also test through together.ai who have Mistral and Mixtral models that support function calling).

Oh, if you are using LiteLLM + Ollama, please be sure to use "ollama_chat/" rather than "ollama/".

Is that work?

Apr 03 '24 18:04 daoxuliu

@woodswift can you update us?

Apr 10 '24 07:04 ekzhu

@woodswift, you should be able to do it through LiteLLM+Ollama (note: Ollama released a new version, 0.1.29, you'll need that). You can also test through together.ai who have Mistral and Mixtral models that support function calling).

Oh, if you are using LiteLLM + Ollama, please be sure to use "ollama_chat/" rather than "ollama/".

hi, I use ollama with Mistral, but still cann't use function calling :( and I don't understand what dose "if you are using LiteLLM + Ollama, please be sure to use "ollama_chat/" rather than "ollama/" " means, can you explain that more?

Apr 22 '24 17:04 JarkimZhu

Hi @JarkimZhu, no problem, when you run your LiteLLM server you need to use "ollama_chat" instead of "ollama", here's an example: litellm --model ollama_chat/llama2

Apr 22 '24 21:04 marklysze

I created several functions with custom model "Mixtral 8x7B", and i can see it in assistant.llm_comfig['tools'] and user_proxy.function_map, but I didn't see the LOG like '***** Suggested too Call'.
And after examinging the source code, I'm still unsure where the key “tool_call” are being added in message["tool_calls"]. Someone help, or some suggestions?
Does "Mixtral 8x7B" support tool call?

Thank you!

May 08 '24 12:05 gujita

Can you tell us how you are running the model? LiteLLM + Ollama, together.ai, etc.

If it's LiteLLM can you please share the command line.

And any sample code you are using would help.

Thanks!

May 08 '24 18:05 marklysze

solution is discussed in the discussion #3196

Oct 18 '24 20:10 rysweet

[Issue]: Unable to enable tool calling when using a custom model

Describe the issue

Steps to reproduce

Step 1: Confirm the custom model's local path

Step 2: Create JSON file named as OAI_CONFIG_LIST.json under the directory /dev/ with the following content

Step 3: Run following Python script

Screenshots and logs

Additional Information

Step 2: Create JSON file named as `OAI_CONFIG_LIST.json` under the directory `/dev/` with the following content