Open-Assistant For peft trainiing how to handle tokenizer changed？

For peft trainiing how to handle tokenizer changed？

Open zhanglu0704 opened this issue 2 years ago • 0 comments

If the model's num_embeddings is 10000,but we change the tokenizer to 10007. After SFT training the model's num_embeddings will be 10016, that because in model/model_training/utils/utils.py get_model(conf, tokenizer, pad_vocab_size_to_multiple_of=16, check_freeze_layer=True) has parameter pad_vocab_size_to_multiple_of=16. But when we try to start a peft training, It will fail because of the following code: if len(tokenizer) != n_embs and check_freeze_layer: assert not conf.freeze_layer, "Cannot change the number of embeddings if the model is frozen."

Aug 09 '23 11:08 zhanglu0704

Open-Assistant Open-Assistant copied to clipboard

For peft trainiing how to handle tokenizer changed？

Open-Assistant
Open-Assistant copied to clipboard