unsloth
unsloth copied to clipboard
Is there anyway to pretrain using unsloth?
Need to pretrain a Gemma 7b model using unsloth.
@VishnuPJ Sadly not - you can do continued pretraining though - https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing can help.
I do not suggest pretraining since you have to spend a lot of compute and money - continued pretraining is the smart choice
Great tool!
A dumb question, is unsloth only compatible with lora?
Thanks!
@danielhanchen Cool , I am trying to pretrain on a different language other than English. Not sure how well it will work. Any way I will try it out. Thanks for the help.
@danielhanchen Cool , I am trying to pretrain on a different language other than English. Not sure how well it will work. Any way I will try it out. Thanks for the help.
When I tested, Gemma 7B already supports multiple languages. Finetuning can be an option.
I though have newbie question. @VishnuPJ would continued pretraining be a better option compared to finetuning it?
@shan23chen Yes LoRA and QLoRA @VishnuPJ Sorry on the delay! Ye continued pretraining as mentioned by @erwe324 is a better solution - ie take some Wikipedia data in the specific language you want, and finetune on top of it, then use some of your instruction data later
@danielhanchen I tried to finetune with text completion task using Llama3 8B. I have added some extra tokens to the llama tokenizer.I modified the tokenizer and resized token embedding s using "model.resize_token_embeddings(len(tokenizer))". But when I tried to run I am getting, "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
@VishnuPJ Oh so use
from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["NEW_TOKEN", "NEW_TOKEN_2"]
# Then add get_peft_model
@VishnuPJ Oh so use
from unsloth import add_new_tokens add_new_tokens(model, tokenizer, new_tokens = ["NEW_TOKEN", "NEW_TOKEN_2"] # Then add get_peft_model
Thanks @danielhanchen . I was able to add the tokens using this method. But I got "CUDA out of memory error" while training. Could be a trainer issue. So I used the following,
tokenizer.add_tokens(["NEW_TOKEN", "NEW_TOKEN_2"])
model.resize_token_embeddings(len(tokenizer))
#Note The above lines should be added before model = FastLanguageModel.get_peft_model(...)
Closing the issue. Thanks @danielhanchen for your help.
Thanks for the discussion! Can anyone explain the difference of SFTTrainer vs UnslothTrainer that is used in the continued pretraining notebook?
@liwd190019 UnslothTrainer can allow you to set 2 learning rates - one for the lm_head / embed_tokens, and another for the LoRA adapters - we talk about that here: https://unsloth.ai/blog/contpretraining