unsloth
unsloth copied to clipboard
Error when loading Almawave/Velvet-14B tokenizer
When trying to load Velvet-14B model and tokenizer an error is raised. Similarly to mistral models, the chat_template doesn't have a add_generation_prompt.
Traceback (most recent call last): File "/u01/SUPPORT/test_unsloth/test_unsloth_velvet.py", line 4, in
model, tokenizer = FastLanguageModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 258, in from_pretrained model, tokenizer = dispatch_model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/mistral.py", line 348, in from_pretrained return FastLlamaModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 1709, in from_pretrained tokenizer = load_correct_tokenizer( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 589, in load_correct_tokenizer chat_template = fix_chat_template(tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/velvet/anaconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 692, in fix_chat_template raise RuntimeError( RuntimeError: Unsloth: The tokenizer Almawave/Velvet-14B
does not have a {% if add_generation_prompt %} for generation purposes. Please file a bug report immediately - thanks!
Script:
from transformers import TextStreamer from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained( model_name="Almawave/Velvet-14B", max_seq_length=16384, load_in_4bit=False )
FastLanguageModel.for_inference(model)
messages = [ {"role": "user", "content": "Ciao chi sei?cosa sai fare?"}, ] inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
gen_idx = len(inputs[0]) outputs = model.generate(input_ids = inputs, max_new_tokens = 4096, use_cache = True)
response = tokenizer.batch_decode(outputs[:, gen_idx:], skip_special_tokens = True)[0] print(response)
Python version: 3.11.11 with unsloth==2025.1.8 and unsloth_zoo==2025.1.4