chat_templates
chat_templates copied to clipboard
Chat Templates for HuggingFace Large Language Models
chat_templates
This is a repository that includes proper chat templates (or input formats) for large language models (LLMs), to support transformers
's chat_template
feature.
We know that different models are trained with different input formats, especially for those instruction-tuned or chat models. This is especially noted in transformers
's new chat_template
feature. However, I found that popular models (e.g., vicuna
, falcon
) on HuggingFace do not include this parameter in their tokenizer_config.json
files, which may make it troublesome to properly run these models. Also, the chat_template
feature requires to implement a Jinja template, which may be not intuitive to be directly done in the json files.
So I collect proper chat templates of several popular models from official reference or implementations, which are put under chat_templates
. If you are interested to include more chat templates, feel free to open a pull request.
If you find this repo useful, please kindly cite it:
@misc{zheng-2024-chat-templates,
author = {Zheng, Chujie},
title = {Chat Templates for HuggingFace Large Language Models},
year = {2024},
howpublished = {\url{https://github.com/chujiezheng/chat_templates}}
}
Updates
- [05/2024] Added support for Nvidia's ChatQA models
- [04/2024] Added support for Microsoft's Phi-3 models
- [04/2024] Added support for Meta's Llama-3 models
- [02/2024] Added support for Google's Gemma models
- [02/2024] Added usage explanation for generation_configs
- [01/2024] Added support for Alibaba's Qwen2 models
What are Contained in This Repo?
-
chat_templates
contains the jinja files of collected chat templates, which can be directly replaced in the Huggingface tokenizers. -
generation_configs
contains the corresponding json configs used for controlling the ending of response generations. Specially, thestop_token_ids
should be directly passed into thegenerate
method by theeos_token_id
argument.
Supported Models
Model (Family) | Template File | Reference | Comment |
---|---|---|---|
llama-3-instruct New |
llama-3-instruct.jinja |
link | Official templateMeta-Llama-3-8B/70B-Instruct |
qwen2-chat New |
chatml.jinja |
link | ChatML formatQwen1.5-0.4B/1.8B/4B/7B/14B/72B-Chat |
mistral-instruct New |
mistral-instruct.jinja |
link | Mistral-7B-Instruct-v0.2/0.3 System message allowed |
phi-3 New |
phi-3.jinja |
link | Official templatePhi-3-mini-4k/128k-instruct |
gemma-it New |
gemma-it.jinja |
link | gemma-2b/7b-it System message allowed |
chatqa New |
chatqa.jinja |
link | Llama3-ChatQA-1.5-8B/70B Context message allowed |
llama-2-chat |
llama-2-chat.jinja |
link | Official templateLlama-2-7b/13b/70b-chat-hf |
mistral-instruct-v0.1 |
mistral-instruct-v0.1.jinja |
link | Mistral-7B-Instruct-v0.1 System message allowed |
openchat |
openchat.jinja |
link | openchat-3.5 |
zephyr |
zephyr.jinja |
link | zephyr-7b-alpha/beta |
yi-chat |
chatml.jinja |
link | ChatML formatYi-6B/34B-Chat |
orca-2 |
chatml.jinja |
link | ChatML formatOrca-2-7b/13b |
vicuna |
vicuna.jinja |
link | vicuna-7b/13b-v1.5 |
falcon-instruct |
falcon-instruct.jinja |
link | falcon-7b/40b-instruct |
starling-lm |
openchat.jinja |
link | Starling-LM-7B-alpha/beta |
solar-instruct |
solar-instruct.jinja |
link | SOLAR-10.7B-Instruct-v1.0 |
alpaca |
alpaca.jinja |
link | alpaca -style models, like Platypus2-13B |
amberchat |
amberchat.jinja |
link | AmberChat , AmberSafe |
saiga |
saiga.jinja |
link | saiga , a series of Russian models |
Note: mistral-instruct-v0.1
is slightly different from mistral-instruct
(for v0.2/0.3)
Examples of Setting chat_template
Important Note: As mentioned in this issue, the messages
should contain at least one user message. It is strongly not recommented to pass only the system message, as there may result in unexpected outputs (because the models are not trained in this way).
Example 1: llama-3-instruct
This example may check if the jinja file is correctly implemented.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", token="YOUR_OWN_TOKEN")
messages = [
{'role': 'system', 'content': 'This is a system prompt.'},
{'role': 'user', 'content': 'This is the first user input.'},
{'role': 'assistant', 'content': 'This is the first assistant response.'},
{'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (yet Correct) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/llama-3-instruct.jinja').read()
chat_template = chat_template.replace(' ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (yet Correct) Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
###### Corrected Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>
This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Example 2: llama-2-chat
This example may check if the jinja file is correctly implemented.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token="YOUR_OWN_TOKEN")
messages = [
{'role': 'system', 'content': 'This is a system prompt.'},
{'role': 'user', 'content': 'This is the first user input.'},
{'role': 'assistant', 'content': 'This is the first assistant response.'},
{'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (yet Correct) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/llama-2-chat.jinja').read()
chat_template = chat_template.replace(' ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (yet Correct) Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>
This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
###### Corrected Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>
This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
Example 3: mistral-instruct
For mistral-instruct
(also gemma-it
), it does not natively support the system
message, so passing the system
message would raise error.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5")
messages = [
{'role': 'system', 'content': 'This is a system prompt.'},
{'role': 'user', 'content': 'This is the first user input.'},
{'role': 'assistant', 'content': 'This is the first assistant response.'},
{'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (but Improper) Chat Template ######')
# raising error
#print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/mistral-instruct.jinja').read()
chat_template = chat_template.replace(' ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (but Error-Raising) Chat Template ######
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
###### Corrected Chat Template ######
<s>[INST] This is a system prompt.
This is the first user input. [/INST] This is the first assistant response. </s>[INST] This is the second user input. [/INST]
Example 4: vicuna
NOTE: In fast-chat, vicuna
does not add linebreaks between roles' messages. But I found that adding linebreaks leads to a bit better performance (especially for the v1.5 version).
Also, I found vicuna-7/13/33b-v1.3
may not work well when given a system message different from its default one. So I would recommend to use vicuna-7/13b-v1.5
instead.
from transformers import AutoTokenizer
toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5")
messages = [
{'role': 'system', 'content': 'This is a system prompt.'},
{'role': 'user', 'content': 'This is the first user input.'},
{'role': 'assistant', 'content': 'This is the first assistant response.'},
{'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (but Improper) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/vicuna.jinja').read()
chat_template = chat_template.replace(' ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
Expected output:
###### Default (but Improper) Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>
This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
###### Corrected Chat Template ######
<s>This is a system prompt.
USER: This is the first user input.
ASSISTANT: This is the first assistant response.</s>
USER: This is the second user input.
ASSISTANT: