unsloth
unsloth copied to clipboard
Add support for Llama 3
It looks like the tokenizer patching breaks. Here's the log:
ValueError Traceback (most recent call last)
Cell In[1], line 20
7 # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
8 fourbit_models = [
9 "unsloth/mistral-7b-bnb-4bit",
10 "unsloth/mistral-7b-v0.2-bnb-4bit", # New Mistral 32K base model
(...)
17 "unsloth/gemma-2b-bnb-4bit",
18 ] # More models at https://huggingface.co/unsloth
---> 20 model, tokenizer = FastLanguageModel.from_pretrained(
21 model_name = "/srv/models/Meta-Llama-3-8B", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
22 max_seq_length = max_seq_length,
23 dtype = dtype,
24 load_in_4bit = load_in_4bit,
25 # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
26 )
File ~/.local/lib/python3.10/site-packages/unsloth/models/loader.py:138, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, *args, **kwargs)
135 tokenizer_name = None
136 pass
--> 138 model, tokenizer = dispatch_model.from_pretrained(
139 model_name = model_name,
140 max_seq_length = max_seq_length,
141 dtype = dtype,
142 load_in_4bit = load_in_4bit,
143 token = token,
144 device_map = device_map,
145 rope_scaling = rope_scaling,
146 fix_tokenizer = fix_tokenizer,
147 model_patcher = dispatch_model,
148 tokenizer_name = tokenizer_name,
149 trust_remote_code = trust_remote_code,
150 *args, **kwargs,
151 )
153 # In case the model supports tagging, add the unsloth tag.
154 if hasattr(model, "add_model_tags"):
File ~/.local/lib/python3.10/site-packages/unsloth/models/llama.py:1121, in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
1112 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
1113 tokenizer = load_correct_tokenizer(
1114 tokenizer_name = tokenizer_name,
1115 model_max_length = max_position_embeddings,
(...)
1118 trust_remote_code = trust_remote_code,
1119 )
-> 1121 model, tokenizer = patch_tokenizer(model, tokenizer)
1122 model = model_patcher.post_patch(model)
1124 # Patch up QKV / O and MLP
File ~/.local/lib/python3.10/site-packages/unsloth/models/_utils.py:152, in patch_tokenizer(model, tokenizer)
149 if not hasattr(tokenizer, "pad_token") or tokenizer.pad_token is None:
150 # Fixes https://github.com/unslothai/unsloth/issues/5
151 if hasattr(tokenizer, "unk_token"):
--> 152 tokenizer.add_special_tokens({"pad_token" : tokenizer.unk_token})
153 tokenizer.pad_token = tokenizer.unk_token
154 else:
File ~/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:973, in SpecialTokensMixin.add_special_tokens(self, special_tokens_dict, replace_additional_special_tokens)
971 else:
972 if not isinstance(value, (str, AddedToken)):
--> 973 raise ValueError(f"Token {value} for key {key} should be a str or an AddedToken instance")
974 if isinstance(value, (str)):
975 # for legacy purpose we default to stripping. `False` depends on this
976 value = AddedToken(value, rstrip=False, lstrip=False, normalized=False, special=True)
ValueError: Token None for key pad_token should be a str or an AddedToken instance
Agreed I want llama 3 support as well
yes yes working on it!
FIXED!!
Colab notebook: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing
I think you should update the readme asap for this :) it will be a good adv. @danielhanchen
Colab notebook: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing
OMG THANK YOU SO MUCH! Already fine tuning my own models with this colab
Looking good, except the chat templating isn't quite right due to the tokenizer change.
FileNotFoundError Traceback (most recent call last)
Cell In[5], line 3
1 from unsloth.chat_templates import get_chat_template
----> 3 tokenizer = get_chat_template(
4 tokenizer,
5 chat_template = "chatml", unsloth
6 mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"},
7 map_eos_token = True,
8 )
10 def formatting_prompts_func(examples):
11 convos = examples["conversations"]
File ~/.local/lib/python3.10/site-packages/unsloth/chat_templates.py:379, in get_chat_template(tokenizer, chat_template, mapping, map_eos_token)
377 # Must fix the sentence piece tokenizer since there's no tokenizer.model file!
378 token_mapping = { old_eos_token : stop_word, }
--> 379 tokenizer = fix_sentencepiece_tokenizer(tokenizer, new_tokenizer, token_mapping,)
380 pass
382 else:
File ~/.local/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:222, in fix_sentencepiece_tokenizer(old_tokenizer, new_tokenizer, token_mapping, temporary_location)
219 old_tokenizer.save_pretrained(temporary_location)
221 tokenizer_file = sentencepiece_model_pb2.ModelProto()
--> 222 tokenizer_file.ParseFromString(open(f"{temporary_location}/tokenizer.model", "rb").read())
224 # Now save the new tokenizer
225 new_tokenizer.save_pretrained(temporary_location)
FileNotFoundError: [Errno 2] No such file or directory: '_unsloth_sentencepiece_temp/tokenizer.model'
@danielhanchen
It's wierd I have this issue, both in unsloth and in LLaMA-Factory, same exact error, and only for the LLAMA3 models.
==((====))== Unsloth: Fast Llama patching release 2024.4
\\ /| GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \ Pytorch: 2.1.2+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.25.post1. FA = False.
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/home/workspace/unsl.py", line 53, in <module>
model, tokenizer = FastLanguageModel.from_pretrained(
File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/loader.py", line 132, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 1085, in from_pretrained
tokenizer = load_correct_tokenizer(
File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/tokenizer_utils.py", line 262, in load_correct_tokenizer
fast_tokenizer.add_bos_token = slow_tokenizer.add_bos_token
AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'add_bos_token'. Did you mean: '_bos_token
Edit: A complete reinstall solved it.
@rwl4 Working on chat template issues! Yep @Sneakr A complete reinstall would work - sorry on the issues
i have a doubt regarding llama3 finetuning. There are two versions of llama3 released: base and instruction finetuned. Is the current llama3 model (unsloth/llama-3-8b-bnb-4bit) the base model or instruction tuned? if its base model, will the instruction tuned model also be added?
@arunpatala Base model.
The Instruct is unsloth/llama-3-8b-Instruct-bnb-4bit.
No the base model is purely a pretrained model with no instruction finetuning
Thanks for the information.
I am able to lora finetune with the instuct model now.
Noticing that non-quantized versions of Llama-3-70B don't seem to be available on Unsloth?
For example, here is non-quantized vs 4bit quantized Llama-3-8B:
- https://huggingface.co/unsloth/llama-3-8b
- https://huggingface.co/unsloth/llama-3-8b-bnb-4bit
On the other hand, only the 4bit 70B model appears to be available:
- https://huggingface.co/unsloth/llama-3-70b-bnb-4bit
Very new to Unsloth, so I may very well be missing something here!
Sadly the non quantized versions are near impossible to finetune anyways with 16bit on a single GPU, so it's not uploaded