FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

bge-m3, tokenizer.add_tokens(key),增加一批专业的单词后,之后如何进行微调

Open mawenju203 opened this issue 1 year ago • 2 comments

mawenju203 avatar Apr 21 '24 15:04 mawenju203

参考;https://discuss.huggingface.co/t/how-to-train-the-embedding-of-special-token/10837 tokenizer.add_tokens和model.resize_embedding之后,把tokenizer和model都save_pretrained下来,直接微调即可。

staoxiao avatar Apr 22 '24 03:04 staoxiao

@staoxiao 谢谢大佬的答复, 日志中包含:Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 是不是代表加进去的专业词已经参与微调的过程了。

mawenju203 avatar Apr 24 '24 01:04 mawenju203

@mawenju203 请问你成功了吗,可以分享一下你的代码吗,我添加了tokenizer.add_tokens和model.resize_embedding然后直接去微调

Tungsong avatar May 21 '24 06:05 Tungsong