[Feature Request] Function Calling for mlx_lm.server
Hello, thanks for the amazing repo. I would like to request support for function calling feature for the mlx_lm server, similar to OpenAI's implementation.
Please let me know if this is on the roadmap, or if there are good frameworks that already implements this.
It would be pretty cool to add this and perhaps not too difficult. I believe function calling requires a few things:
- A model which supports the function calling prompt format. Do you know of a good open source model for that?
- Updating the server API to accept the right query and return the right response
- Converting the HTTP request input into the correct prompt for the model
Marked as an enhancement. I will leave it open if someone is interested in working on it.
Are we able to integrate with open source framework? For example langchain autogen, etc.
It would be pretty cool to add this and perhaps not too difficult. I believe function calling requires a few things:
- A model which supports the function calling prompt format. Do you know of a good open source model for that?
- Updating the server API to accept the right query and return the right response
- Converting the HTTP request input into the correct prompt for the model
Marked as an enhancement. I will leave it open if someone is interested in working on it.
Maybe we can take a look at ollama-mistral:v0.3
Prompt
[AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location."}}, "required": ["location", "format"]}}}][/AVAILABLE_TOOLS][INST] What is the weather like today in San Francisco [/INST]
Response
[TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"location": "San Francisco, CA", "format": "celsius"}}]
Any model base on mistral 0.3 should work the same.
I wrote a library that constrains LLM output to a JSON schema in a performant way, and implemented a function calling/tools server example for MLX with it. I find that it works quite well even with models that have not been fine-tuned for function calling specifically.
You can check it out here: https://github.com/otriscon/llm-structured-output
If you want to give it a try, I'm happy to answer any questions and open to suggestions for improvement.
Hi guys, I added support for Function Calling according to the method in the Tool Use, Unified article. Currently, the model's output still needs to be manually parsed. See #1003 for details.