unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Is there anyway to pretrain using unsloth?

Open VishnuPJ opened this issue 1 year ago • 7 comments

Need to pretrain a Gemma 7b model using unsloth.

VishnuPJ avatar Apr 16 '24 14:04 VishnuPJ

@VishnuPJ Sadly not - you can do continued pretraining though - https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing can help.

I do not suggest pretraining since you have to spend a lot of compute and money - continued pretraining is the smart choice

danielhanchen avatar Apr 16 '24 17:04 danielhanchen

Great tool!

A dumb question, is unsloth only compatible with lora?

Thanks!

shan23chen avatar Apr 16 '24 22:04 shan23chen

@danielhanchen Cool , I am trying to pretrain on a different language other than English. Not sure how well it will work. Any way I will try it out. Thanks for the help.

VishnuPJ avatar Apr 17 '24 08:04 VishnuPJ

@danielhanchen Cool , I am trying to pretrain on a different language other than English. Not sure how well it will work. Any way I will try it out. Thanks for the help.

When I tested, Gemma 7B already supports multiple languages. Finetuning can be an option.

I though have newbie question. @VishnuPJ would continued pretraining be a better option compared to finetuning it?

ewre324 avatar Apr 21 '24 12:04 ewre324

@shan23chen Yes LoRA and QLoRA @VishnuPJ Sorry on the delay! Ye continued pretraining as mentioned by @erwe324 is a better solution - ie take some Wikipedia data in the specific language you want, and finetune on top of it, then use some of your instruction data later

danielhanchen avatar Apr 21 '24 17:04 danielhanchen

@danielhanchen I tried to finetune with text completion task using Llama3 8B. I have added some extra tokens to the llama tokenizer.I modified the tokenizer and resized token embedding s using "model.resize_token_embeddings(len(tokenizer))". But when I tried to run I am getting, "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"

VishnuPJ avatar Apr 22 '24 20:04 VishnuPJ

@VishnuPJ Oh so use

from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["NEW_TOKEN", "NEW_TOKEN_2"]

# Then add get_peft_model

danielhanchen avatar Apr 23 '24 17:04 danielhanchen

@VishnuPJ Oh so use

from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["NEW_TOKEN", "NEW_TOKEN_2"]

# Then add get_peft_model

Thanks @danielhanchen . I was able to add the tokens using this method. But I got "CUDA out of memory error" while training. Could be a trainer issue. So I used the following,

tokenizer.add_tokens(["NEW_TOKEN", "NEW_TOKEN_2"])  
model.resize_token_embeddings(len(tokenizer))  

#Note The above lines should be added before model = FastLanguageModel.get_peft_model(...)

VishnuPJ avatar Apr 24 '24 07:04 VishnuPJ

Closing the issue. Thanks @danielhanchen for your help.

VishnuPJ avatar Apr 24 '24 07:04 VishnuPJ

Thanks for the discussion! Can anyone explain the difference of SFTTrainer vs UnslothTrainer that is used in the continued pretraining notebook?

liwd190019 avatar Jun 16 '24 23:06 liwd190019

@liwd190019 UnslothTrainer can allow you to set 2 learning rates - one for the lm_head / embed_tokens, and another for the LoRA adapters - we talk about that here: https://unsloth.ai/blog/contpretraining

danielhanchen avatar Jun 17 '24 17:06 danielhanchen