mlc-llm
mlc-llm copied to clipboard
[Serving] Support tool function calls under strict format constraints
This PR supports tool function calls under strict format constraints. Specifically, it uses structural tag to constrain the calling format. It made following changes:
- Add "tool_call_format" attribute in EngineConfig, which determines the tool calls format
tool_call_format : Literal["json", "xml", "python"] = "json"
The tool function call foramt.
"json" means model will call tool function in json style format
'{"name": func_name, "parameters": parameters(JSON dict)}',
e.g. '{"name": "get_time", "parameters": {"location": "Pittsburgh"}}'.
"xml" means model will call tool function in xml style format
'<function=func_name>{parameters(JSON dict)}</function>',
e.g. '<function=get_time>{"location": "Pittsburgh"}</function>'.
"python" means model will call tool function in python-style format,
e.g. 'wolfram_alpha.call(query="solve x^3 - 4x^2 + 6x - 24 = 0")'.
In most cases, the "json" and "xml" mode can meet the requirements. For some models specialized in Python code call output, "python" mode can be used, which means output will be parsed in python ast. For a few special cases, users can use the API of structural tags to customize their own function call format.
- Add "strict" attribute in ChatFunction, which is aligned to OpenAI API
- Set system prompt according to tool_call_format
- Set structural tag to ensure strict func calls
- Parse output to json-style func calls
- Add Structural-Tag api to RequestResponseFormat #3187 , including:
- Upgrade xgrammar to latest version
- Add Structural-Tag-relevant attributes to RequestResponseFormat and modify corresponding process
- Align RequestResponseFormat with open-ai protocol
- Add test script for Structural-Tag
- Use vocab_size in config.json instead of tokenizer.vocab_size to build xgrammar mask