transformers Chat Template support for function calling and RAG

This PR updates our support of chat templates to cover tool-use and RAG use-cases. Specifically, it does the following:

Defines a recommended JSON schema spec for tool use
Adds tools and documents kwargs to apply_chat_template, with accompanying docstrings
Updates the chat template documentation to match
Adds a get_json_schema function to automatically generate a schema from a Python function
Update apply_chat_template to automatically convert passed functions to JSON schemas

TODO:

[x] Do we want to support Tuple?
[x] ~In some cases, the helper generates a more complex anyOf when a simple list of types would be okay.~
[x] Community feedback before we make this a standard!
[x] Fix up doc examples and docstring to match the newest format
[x] Can we do an example template using return values?
[x] Get a working Command-R template
[x] Get a working Hermes-2-Pro template
[x] Get a working Mixtral-8x22B template

May 02 '24 15:05 Rocketknight1

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 02 '24 15:05 HuggingFaceDocBuilderDev

RAG documents are defined by title and contents keys - is there anything else I'm missing?

If possible I would allow a metadata tag for RAG that can support an open ended dictionary.

It is highly unlikely production RAG systems will only use content and title, but at the same time it's hard to predict what metadata people might use.

I'm not actually sure this is possible or desirable ofc from the pov of a chat template that is.

May 11 '24 00:05 TheodoreGalanos

@TheodoreGalanos That's a good point, yes, but I'm not sure we can include it in the chat template tooling itself! Basically, the goal of this PR is to standardize things like tool and document definitions that are going to be universal for tool use and RAG LLMs. Document metadata doesn't really have a standard that we can support yet.

That said, if we start seeing LLMs whose templates support it, we might consider making it part of the tooling standard.

May 13 '24 13:05 Rocketknight1

Cancel this review request by the way - I realized I'll need to do some major refactors to support some of the existing tool use templates out there!

May 17 '24 15:05 Rocketknight1

@Rocketknight1 OK, just ping when you want me to look at it again

May 17 '24 16:05 amyeroberts

@amyeroberts this should be ready for review now! I still have to write the Mixtral template, but I think the core code here is ready.

May 22 '24 15:05 Rocketknight1

cc @osanseviero - I thought about our chat yesterday, and I removed the decorator entirely. You're right that they can seem like confusing magic for users.

Instead, the new API lets you simply pass functions in the tools argument, with no decorator or anything. apply_chat_template will convert them into JSON schemas for you. You can also pass JSON schemas directly to the argument instead.

May 23 '24 13:05 Rocketknight1

@amyeroberts all comments should be addressed now! Failing tests are unrelated, I'm hoping a rebase fixes them soon.

May 24 '24 16:05 Rocketknight1

I'd be interested in having @molbap's review as well!

May 27 '24 12:05 LysandreJik

@Rocketknight1 This looks great, however I think it might be worth making the tools parameter identical to OpenAI, ie. move the schema down a level (inside function) and have type and function at the top level. They did this to be able to support different kinds of tools in the future.

You can easily extract and map the available functions in the template using tools|selectattr('type','eq','function')|map(attribute='function') like I do in Mistral-7B-Instruct-v0.3-SOTA-GGUF.

May 28 '24 14:05 CISC

Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.

May 28 '24 14:05 Rocketknight1

Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.

TBH though that means it should be functions and not tools.

May 28 '24 14:05 CISC

@CISC Are there any examples of 'tools' that aren't functions out there? I'm curious about what those would look like!

May 28 '24 14:05 Rocketknight1

@CISC Are there any examples of 'tools' that aren't functions out there? I'm curious about what those would look like!

Yes, in OpenAI Assistants API.

You can also easily promote functions to tools (and the other way) if only one of them is specified, which was why I made gorilla-openfunctions-v2-SOTA-GGUF handle both.

May 28 '24 14:05 CISC

Hmn, I feel like all of those categories could have the same API as functions with just different arguments, but still, I'll think about it - maybe retaining the flexibility in future would be worth the extra verbosity now.

May 28 '24 14:05 Rocketknight1

@CISC After some internal discussion, we decided to accept your suggestion! The API will now wrap tool JSON in {"type": "function", "function": ...}, matching the OpenAI API.

I'll make a small update to the template PRs to model repos to match.

May 29 '24 17:05 Rocketknight1

@Rocketknight1 Don't forget to add the tool role to the template PRs you are submitting so that it's possible to do the full user->tool_calls->tool->assistant round-trip to get a natural response.

Keep in mind that since the tool_calls response can contain multiple calls, but the tool role can only contain the result of one tool call the user may need to add several tool responses in a row. This will disrupt the usual user/assistant/user/assistant cycle that some templates strictly enforce.

May 29 '24 23:05 CISC

@CISC yes! That needs much less support in the main library, though, so it might just be a change we make to individual model templates.

May 30 '24 13:05 Rocketknight1

@molbap @amyeroberts I think everything's been addressed - let me know if I'm missing anything!

May 30 '24 14:05 Rocketknight1

Nice -will review this afternoon!

May 31 '24 08:05 molbap

Curious about the choice to not to follow OpenAI convention, with tool_calls being a separate role from assistant.

May 31 '24 17:05 jenkspt

Curious about the choice to not to follow OpenAI convention, with tool_calls being a separate role from assistant.

@jenkspt Not sure what you mean? tool_calls is part of the assistant role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?

May 31 '24 18:05 CISC

Curious about the choice to not to follow OpenAI convention, with tool_calls being a separate role from assistant.

@jenkspt Not sure what you mean? tool_calls is part of the assistant role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?

OpenAI API follows:

{"role": "assistant", "content": "....", "tool_calls": "..."}
{"role": "tool", "content": "..."}

While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)

{"role": "assistant", "content": "..."}
{"role": "tool_calls", "content": "..."}
{"role": "tool_results", "content": "..."}

The advantage of the OpenAI convention is:

It's already widely adopted (I believe it's also used in the mistral api)
a model always responds with a single assistant message (which can include assistant response and/or tool calls)

My question is - why this design choice? Some of the pros I can think of are:

easier to format in a jinja template (just a loop over messages)

cons:

possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)

e.g.

{"role": "assistant", "content": "sure I can get that for you ..."}
{"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": "..."}'

What else is there to consider for this design choice?

Jun 03 '24 17:06 jenkspt

Curious about the choice to not to follow OpenAI convention, with tool_calls being a separate role from assistant.

@jenkspt Not sure what you mean? tool_calls is part of the assistant role, as per OpenAI API, but that is not part of this PR, where do you see otherwise?

OpenAI API follows:
{"role": "assistant", "content": "....", "tool_calls": "..."}
{"role": "tool", "content": "..."}
While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)
{"role": "assistant", "content": "..."}
{"role": "tool_calls", "content": "..."}
{"role": "tool_results", "content": "..."}
The advantage of the OpenAI convention is:

It's already widely adopted (I believe it's also used in the mistral api)

a model always responds with a single assistant message (which can include assistant response and/or tool calls)

My question is - why this design choice? Some of the pros I can think of are:

easier to format in a jinja template (just a loop over messages)

cons:

possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)

e.g.
{"role": "assistant", "content": "sure I can get that for you ..."}
{"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": "..."}'
What else is there to consider for this design choice?

Sorry I'm realizing that these questions maybe aren't directly related to this PR. Is there a better place to discuss?

Jun 03 '24 18:06 jenkspt

While it seems this PR has a different design (I'm looking at this: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/33)
{"role": "assistant", "content": "..."}
{"role": "tool_calls", "content": "..."}
{"role": "tool_results", "content": "..."}

Good spot, I hadn't seen this one. I hope @Rocketknight1 reconsiders this.

My question is - why this design choice? Some of the pros I can think of are: * easier to format in a jinja template (just a loop over messages)

It's not that hard to do though, take a look at my chat template here (load the GGUF viewer on one of them). This follows the OpenAI convention of user->assistant (with possible tool_calls attached)->tool/user->assistant

cons: * possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)

Yes, if by multiple messages you mean multiple tools (tool_calls is an array), in this case you would do the following (which my chat template also handles): user->assistant->tool->tool->...->assistant, most models will then combine the multiple tool results into a single natural language assistant response.

Note that the tool results are queued up and submitted all at once for the final assistant generation.

Sorry I'm realizing that these questions maybe aren't directly related to this PR. Is there a better place to discuss?

Good question. :)

Jun 03 '24 19:06 CISC

cons:

possibly multiple messages are returned when a tool is called (would this result in two generation calls to the LLM?)

Yes, if by multiple messages you mean multiple tools (tool_calls is an array), in this case you would do the following (which my chat template also handles): user->assistant->tool->tool->...->assistant, most models will then combine the multiple tool results into a single natural language assistant response.

Let me clarify what I mean by multiple messages.

Let's consider a completion response that looks like this:

Yes I can do that for you
[TOOL_CALLS]
{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}
[TOOL_RESULTS]

Using an openai-like API - this would get returned as a single assistant message. Something like this:

{"role": "assistant", "content": "Yes I can do that for you", "tool_calls": '{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}'}

(Note - not all models respond in this way -- i.e. adding assistant content before a tool call -- but gpt4 does).

but with the huggingface design -- the response would be two messages -- e.g:

{"role": "assistant", "content": "Yes I can do that for you"}
{"role": "tool_calls", "content": '{"name": "get_current_weather", "arguments": {"city": "San Francisco"}}'}

For this version - is the expectation that you make two llm calls - one for each message?

assistant_message = llm(messages)
tool_message = llm(message + [assistant_message])

Or something like this

response_messages = llm(messages)

Jun 03 '24 20:06 jenkspt

Hi @jenkspt, that Mixtral PR I made is a little out of date, and will be updated soon. The API we're intending now is quite close to the OpenAI API:

tools is passed to apply_chat_template and is a list of JSON schemas. tool_calls is an extra key in an assistant message dict. Outputs from tools are sent as messages with message['role'] == "tool_results"

Jun 05 '24 13:06 Rocketknight1

Outputs from tools are sent as messages with message['role'] == "tool_results"

@Rocketknight1 Any reason for not going with the tool role OpenAI is using?

Jun 05 '24 14:06 CISC

@CISC Not especially! We do have one or two other unavoidable divergences from the OpenAI API. Most notably, we allow a return key in the JSON schema for tools. This is ignored by most models, but some models (e.g. Hermes-2-Pro) expect return types for tools.

Jun 05 '24 15:06 Rocketknight1

@CISC My bad, actually! The role has already been updated from tool_results to tool as you suggested, I just forgot that I made that commit :sweat:

Jun 05 '24 15:06 Rocketknight1

transformers transformers copied to clipboard

Chat Template support for function calling and RAG

transformers
transformers copied to clipboard