transformers
transformers copied to clipboard
Chat Template support for function calling and RAG
This PR updates our support of chat templates to cover tool-use and RAG use-cases. Specifically, it does the following:
- Defines a recommended JSON schema spec for tool use
- Adds
tools
anddocuments
kwargs toapply_chat_template
, with accompanying docstrings - Updates the chat template documentation to match
- Adds a
get_json_schema
function to automatically generate a schema from a Python function - Update
apply_chat_template
to automatically convert passed functions to JSON schemas
TODO:
- [x] Do we want to support
Tuple
? - [x] ~In some cases, the helper generates a more complex
anyOf
when a simple list of types would be okay.~ - [x] Community feedback before we make this a standard!
- [x] Fix up doc examples and docstring to match the newest format
- [x] Can we do an example template using return values?
- [x] Get a working Command-R template
- [x] Get a working Hermes-2-Pro template
- [x] Get a working Mixtral-8x22B template
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
RAG documents are defined by title and contents keys - is there anything else I'm missing?
If possible I would allow a metadata tag for RAG that can support an open ended dictionary.
It is highly unlikely production RAG systems will only use content and title, but at the same time it's hard to predict what metadata people might use.
I'm not actually sure this is possible or desirable ofc from the pov of a chat template that is.
@TheodoreGalanos That's a good point, yes, but I'm not sure we can include it in the chat template tooling itself! Basically, the goal of this PR is to standardize things like tool and document definitions that are going to be universal for tool use and RAG LLMs. Document metadata doesn't really have a standard that we can support yet.
That said, if we start seeing LLMs whose templates support it, we might consider making it part of the tooling standard.
Cancel this review request by the way - I realized I'll need to do some major refactors to support some of the existing tool use templates out there!
@Rocketknight1 OK, just ping when you want me to look at it again
@amyeroberts this should be ready for review now! I still have to write the Mixtral template, but I think the core code here is ready.
cc @osanseviero - I thought about our chat yesterday, and I removed the decorator entirely. You're right that they can seem like confusing magic for users.
Instead, the new API lets you simply pass functions in the tools
argument, with no decorator or anything. apply_chat_template
will convert them into JSON schemas for you. You can also pass JSON schemas directly to the argument instead.
@amyeroberts all comments should be addressed now! Failing tests are unrelated, I'm hoping a rebase fixes them soon.
I'd be interested in having @molbap's review as well!
@Rocketknight1 This looks great, however I think it might be worth making the tools
parameter identical to OpenAI, ie. move the schema down a level (inside function
) and have type
and function
at the top level. They did this to be able to support different kinds of tools in the future.
You can easily extract and map the available functions in the template using tools|selectattr('type','eq','function')|map(attribute='function')
like I do in Mistral-7B-Instruct-v0.3-SOTA-GGUF.
Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.
Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.
TBH though that means it should be functions
and not tools
.
@CISC Are there any examples of 'tools' that aren't functions out there? I'm curious about what those would look like!
@CISC Are there any examples of 'tools' that aren't functions out there? I'm curious about what those would look like!
Yes, in OpenAI Assistants API.
You can also easily promote functions
to tools
(and the other way) if only one of them is specified, which was why I made gorilla-openfunctions-v2-SOTA-GGUF handle both.
Hmn, I feel like all of those categories could have the same API as functions with just different arguments, but still, I'll think about it - maybe retaining the flexibility in future would be worth the extra verbosity now.
@CISC After some internal discussion, we decided to accept your suggestion! The API will now wrap tool JSON in {"type": "function", "function": ...}
, matching the OpenAI API.
I'll make a small update to the template PRs to model repos to match.
@Rocketknight1 Don't forget to add the tool
role to the template PRs you are submitting so that it's possible to do the full user
->tool_calls
->tool
->assistant
round-trip to get a natural response.
Keep in mind that since the tool_calls
response can contain multiple calls, but the tool
role can only contain the result of one tool call the user may need to add several tool
responses in a row. This will disrupt the usual user
/assistant
/user
/assistant
cycle that some templates strictly enforce.
@CISC yes! That needs much less support in the main library, though, so it might just be a change we make to individual model templates.
@molbap @amyeroberts I think everything's been addressed - let me know if I'm missing anything!
Nice -will review this afternoon!
Curious about the choice to not to follow OpenAI convention, with tool_calls
being a separate role from assistant
.
Curious about the choice to not to follow OpenAI convention, with
tool_calls
being a separate role fromassistant
.
@jenkspt Not sure what you mean? tool_calls
is part of the assistant
role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?
Curious about the choice to not to follow OpenAI convention, with
tool_calls
being a separate role fromassistant
.@jenkspt Not sure what you mean?
tool_calls
is part of theassistant
role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?
OpenAI API follows:
{"role": "assistant", "content": "....", "tool_calls": "..."}
{"role": "tool", "content": "..."}
While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)
{"role": "assistant", "content": "..."}
{"role": "tool_calls", "content": "..."}
{"role": "tool_results", "content": "..."}
The advantage of the OpenAI convention is:
- It's already widely adopted (I believe it's also used in the mistral api)
- a model always responds with a single assistant message (which can include assistant response and/or tool calls)
My question is - why this design choice? Some of the pros I can think of are:
- easier to format in a jinja template (just a loop over messages)
cons:
- possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)
e.g.
{"role": "assistant", "content": "sure I can get that for you ..."}
{"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": "..."}'
What else is there to consider for this design choice?
Curious about the choice to not to follow OpenAI convention, with
tool_calls
being a separate role fromassistant
.@jenkspt Not sure what you mean?
tool_calls
is part of theassistant
role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?OpenAI API follows:
{"role": "assistant", "content": "....", "tool_calls": "..."} {"role": "tool", "content": "..."}
While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)
{"role": "assistant", "content": "..."} {"role": "tool_calls", "content": "..."} {"role": "tool_results", "content": "..."}
The advantage of the OpenAI convention is:
- It's already widely adopted (I believe it's also used in the mistral api)
- a model always responds with a single assistant message (which can include assistant response and/or tool calls)
My question is - why this design choice? Some of the pros I can think of are:
- easier to format in a jinja template (just a loop over messages)
cons:
- possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)
e.g.
{"role": "assistant", "content": "sure I can get that for you ..."} {"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": "..."}'
What else is there to consider for this design choice?
Sorry I'm realizing that these questions maybe aren't directly related to this PR. Is there a better place to discuss?
While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)
{"role": "assistant", "content": "..."} {"role": "tool_calls", "content": "..."} {"role": "tool_results", "content": "..."}
Good spot, I hadn't seen this one. I hope @Rocketknight1 reconsiders this.
My question is - why this design choice? Some of the pros I can think of are: * easier to format in a jinja template (just a loop over messages)
It's not that hard to do though, take a look at my chat template here (load the GGUF viewer on one of them). This follows the OpenAI convention of user
->assistant
(with possible tool_calls
attached)->tool
/user
->assistant
cons: * possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)
Yes, if by multiple messages you mean multiple tools (tool_calls
is an array), in this case you would do the following (which my chat template also handles): user
->assistant
->tool
->tool
->...->assistant
, most models will then combine the multiple tool
results into a single natural language assistant
response.
Note that the tool
results are queued up and submitted all at once for the final assistant
generation.
Sorry I'm realizing that these questions maybe aren't directly related to this PR. Is there a better place to discuss?
Good question. :)
cons:
- possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)
Yes, if by multiple messages you mean multiple tools (
tool_calls
is an array), in this case you would do the following (which my chat template also handles):user
->assistant
->tool
->tool
->...->assistant
, most models will then combine the multipletool
results into a single natural languageassistant
response.
Let me clarify what I mean by multiple messages.
Let's consider a completion response that looks like this:
Yes I can do that for you
[TOOL_CALLS]
{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}
[TOOL_RESULTS]
Using an openai-like API - this would get returned as a single assistant message. Something like this:
{"role": "assistant", "content": "Yes I can do that for you", "tool_calls": '{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}'}
(Note - not all models respond in this way -- i.e. adding assistant content before a tool call -- but gpt4 does).
but with the huggingface design -- the response would be two messages -- e.g:
{"role": "assistant", "content": "Yes I can do that for you"}
{"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}'}
For this version - is the expectation that you make two llm calls - one for each message?
assistant_message = llm(messages)
tool_message = llm(message + [assistant_message])
Or something like this
response_messages = llm(messages)
Hi @jenkspt, that Mixtral PR I made is a little out of date, and will be updated soon. The API we're intending now is quite close to the OpenAI API:
tools
is passed to apply_chat_template
and is a list of JSON schemas.
tool_calls
is an extra key in an assistant
message dict.
Outputs from tools are sent as messages with message['role'] == "tool_results"
Outputs from tools are sent as messages with
message['role'] == "tool_results"
@Rocketknight1 Any reason for not going with the tool
role OpenAI is using?
@CISC Not especially! We do have one or two other unavoidable divergences from the OpenAI API. Most notably, we allow a return
key in the JSON schema for tools. This is ignored by most models, but some models (e.g. Hermes-2-Pro) expect return types for tools.
@CISC My bad, actually! The role has already been updated from tool_results
to tool
as you suggested, I just forgot that I made that commit :sweat: