FastChat
FastChat copied to clipboard
Llama 3.1 chat template has <|begin_of_text|> encoded twice
In conversation.py (my comment):
elif self.sep_style == SeparatorStyle.LLAMA3:
# No! It's already added in encode_dialog_prompt chat_format.py
#ret = "<|begin_of_text|>"
if self.system_message:
ret += system_prompt
And in encode_dialog_prompt:
tokens = []
tokens.append(self.tokenizer.special_tokens["<|begin_of_text|>"])
for message in messages:
Not sure which one to keep, but only one of them for sure.