unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

phi3 mini model add new token

Open NickyDark1 opened this issue 9 months ago • 4 comments

Is it possible to add new token and special tokens to be trained? What would the code be like?

NickyDark1 avatar May 01 '24 23:05 NickyDark1

example token special:

"32005": {

  | "content": "<|function_call|>",   | "lstrip": false,   | "normalized": false,   | "rstrip": true,   | "single_word": false,   | "special": true   | },

https://huggingface.co/NickyNicky/Phi-3-mini-128k-instruct_function/blob/main/tokenizer_config.json

NickyDark1 avatar May 02 '24 00:05 NickyDark1

Yes - I haven't announced it yet, but you can use:

from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["<SPECIAL_TOKEN_1>", "<SPECIAL_TOKEN_2>")

Do this before get_peft_model

danielhanchen avatar May 04 '24 09:05 danielhanchen

similar? Would it make a difference to add the normal tokens and the special ones?

special_tokens_dict = {'additional_special_tokens': ['[C1]','[C2]','[C3]','[C4]']} num_added_toks = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer))

NickyDark1 avatar May 04 '24 22:05 NickyDark1

oh theyre all special tokens! just use add_new_tokens for all of them

danielhanchen avatar May 05 '24 03:05 danielhanchen