Linly
Linly copied to clipboard

Published 20 hours ago •

Reame
Issues

请问模型在tencentpretrain框架下预训练时选择的是bpe tokenizer吗？是否有对应的预训练的merge.txt呢？

Open yyqi17 opened this issue 1 year ago • 0 comments

在基于chatflow模型用lora精调时，遇到了生成结果乱码的问题，猜测是由于tokenizer没有选择bpe导致的。但选用bpe tokenizer需要merge.txt预训练文件，想问一下这里我们在预训练时是怎样设置的？Linly-OpenLLaMA还需要bpe tokenizer吗？

May 29 '23 04:05 yyqi17