ChatML template issue with Llama-2-7b-chat-hf
The bug I'm trying to run llaam-2-7b-chat-hf with togtherAI client. But I'm getting following error from tokenizer.
Exception: The tokenizer provided to the engine follows a non-ChatML format in its chat_template. Using a transformers, tiktoken, or guidance.GrammarlessTokenizer directly will solve this issue.
To Reproduce
from guidance import models, gen, select, user, system, assistant
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", use_fast=False)
llama2 = models.TogetherAI("meta-llama/Llama-2-7b-chat-hf", tokenizer, echo=False)
with user():
llama2 += f'what is your name? '
with assistant():
llama2 += gen("answer", stop='.')
print(llama2["answer"])
System info (please complete the following information):
- OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): MacOS
- Guidance Version (
guidance.__version__): 0.1.15
Does the code work if you just do
llama2 = models.Transformers("meta-llama/Llama-2-7b-chat-hf")
I don't think that the TogetherAI code has been updated recently, and may well have issues.
Yes, this is working as expected. It might be an issue with TogetherAI implementation. I'm trying to create a client similar to OpenAI for my inference engine. I would look through the implementation and try to figure out where this is happening. Let me know if you have any pointers on where this might be causing an issue.
It's happening because of the GrammerlessEngine. The chat template is restricted in GrammerlessEngine. Is there any specific accuracy issue If I extend the class to remove that criterion?
Thanks for bringing this up @dittops! @riedgar-ms, the issue here isn't with the TogetherAI class -- for all Grammarless models, we use the ChatML format by default to structure how we convert our role tags, so that we can parse them on the engine side to send via REST. This isn't a good assumption, I just did it for simplicity.
In reality, we should probably use our own custom format for remote/grammarless endpoints (overriding what the tokenizer itself uses). An alternative is to ensure that each Tokenizer has a reliable two way conversion specified between plaintext <> messages format.
@dittops -- for right now, this is a bit hacky, but you can make this work by initializing a GrammarlessTokenizer directly with something like:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", use_fast=False)
from guidance.models._grammarless import GrammarlessTokenizer
g_tokenizer = GrammarlessTokenizer(tokenizer)
llama2 = models.TogetherAI("meta-llama/Llama-2-7b-chat-hf", g_tokenizer, echo=False)
This won't cause accuracy differences -- it just changes the chat format, which doesn't get used by TogetherAI anyway because ultimately API calls get structured in the messages format, and don't actually use the special tokens themselves.
That said, this is really confusing so we should update guidance to make this class public, and change how we insert and parse chat related tokens for grammarless models (perhaps with our own unique special format)
Thanks for the detailed response. I have tried this hack, but now I'm getting the below error.
id: token for token, id in tokenizer.get_added_vocab().items()
^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'GrammarlessTokenizer' object has no attribute 'get_added_vocab'
Oh weird, thanks for sharing. That's a clear bug on our side, we should be able to init a transformers AutoTokenizer as a GrammarlessTokenizer.
I think for now, your generation quality won't really be hurt much by not passing a tokenizer argument into your TogetherAI call (thereby defaulting to the GPT-2 tokenizer). We don't leverage the tokenizer much for Grammarless models, because there's pretty heavily limitations to the type of structure we can enforce anyway. But we should leave this issue open while we hunt down and fix these bugs -- thanks so much for bringing this to our attention!
Thank you for the update, I will go with that for now.