Open-Assistant
Open-Assistant copied to clipboard
For peft trainiing how to handle tokenizer changed?
If the model's num_embeddings is 10000,but we change the tokenizer to 10007. After SFT training the model's num_embeddings will be 10016, that because in model/model_training/utils/utils.py get_model(conf, tokenizer, pad_vocab_size_to_multiple_of=16, check_freeze_layer=True) has parameter pad_vocab_size_to_multiple_of=16. But when we try to start a peft training, It will fail because of the following code: if len(tokenizer) != n_embs and check_freeze_layer: assert not conf.freeze_layer, "Cannot change the number of embeddings if the model is frozen."