unsloth
unsloth copied to clipboard
phi3 mini model add new token
Is it possible to add new token and special tokens to be trained? What would the code be like?
example token special:
"32005": {
 | "content": "<|function_call|>",  | "lstrip": false,  | "normalized": false,  | "rstrip": true,  | "single_word": false,  | "special": true  | },
https://huggingface.co/NickyNicky/Phi-3-mini-128k-instruct_function/blob/main/tokenizer_config.json
Yes - I haven't announced it yet, but you can use:
from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["<SPECIAL_TOKEN_1>", "<SPECIAL_TOKEN_2>")
Do this before get_peft_model
similar? Would it make a difference to add the normal tokens and the special ones?
special_tokens_dict = {'additional_special_tokens': ['[C1]','[C2]','[C3]','[C4]']} num_added_toks = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer))
oh theyre all special tokens! just use add_new_tokens
for all of them