unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Error when loading Almawave/Velvet-14B tokenizer

Open dtdxdydz opened this issue 2 weeks ago • 2 comments

When trying to load Velvet-14B model and tokenizer an error is raised. Similarly to mistral models, the chat_template doesn't have a add_generation_prompt.

Traceback (most recent call last): File "/u01/SUPPORT/test_unsloth/test_unsloth_velvet.py", line 4, in model, tokenizer = FastLanguageModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 258, in from_pretrained model, tokenizer = dispatch_model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/mistral.py", line 348, in from_pretrained return FastLlamaModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 1709, in from_pretrained tokenizer = load_correct_tokenizer( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 589, in load_correct_tokenizer chat_template = fix_chat_template(tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 692, in fix_chat_template raise RuntimeError( RuntimeError: Unsloth: The tokenizer Almawave/Velvet-14B does not have a {% if add_generation_prompt %} for generation purposes. Please file a bug report immediately - thanks!

Script:

from transformers import TextStreamer from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained( model_name="Almawave/Velvet-14B", max_seq_length=16384, load_in_4bit=False )

FastLanguageModel.for_inference(model)

messages = [ {"role": "user", "content": "Ciao chi sei?cosa sai fare?"}, ] inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")

gen_idx = len(inputs[0]) outputs = model.generate(input_ids = inputs, max_new_tokens = 4096, use_cache = True)

response = tokenizer.batch_decode(outputs[:, gen_idx:], skip_special_tokens = True)[0] print(response)

Python version: 3.11.11 with unsloth==2025.1.8 and unsloth_zoo==2025.1.4

dtdxdydz avatar Feb 06 '25 16:02 dtdxdydz