blue-vision0 comments

Results 4 comments of


                                            blue-vision0

训练时提示“Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.”，推理时报错找不到tokenzier

有两种办法解决推理的时候找不到Tokenizer的问题 1. 把Finetune之后的模型文件中的tokenizer_config.json替换为之前基座的tokenizer_config.json（内容是这样） ![image](https://github.com/FlagOpen/FlagEmbedding/assets/89055561/c8080403-a9c4-4ec6-9707-d03ce45ad984) 2.查看Finetune之后的模型的tokenizer_config.json里面写的tokenizer_file的位置，然后把基座模型的tokenizer_config.json放到对应位置即可

训练时提示“Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.”，推理时报错找不到tokenzier

请问作者，这算是bug吗？有没有解决方案，之前训练的时候从不会报“ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained”，也没有引发后续的问题

关于CLS和MEAN_POOLING的问题

那请问bge-embedding的finetune训练脚本中默认使用的是CLS吗？

如何基于transformers库自定义模型？

> Really? Why rename can make mistake? I have recurrented your code and there is no error. Check out if you have any other problems Maybe it has something to...