Linly icon indicating copy to clipboard operation
Linly copied to clipboard

请问通过运行preprocess.py 发现 tokenizer 用的是bert 这个是对的嘛?

Open baketbek opened this issue 1 year ago • 1 comments

请问通过运行preprocess.py 发现 tokenizer 用的是bert 这个是对的嘛?

baketbek avatar Apr 11 '23 05:04 baketbek

tencentpretrain/utils/tokenizers.py 从里面看应该是没问题,当有spm_model_path时会用sentencepiece来加载,符合llama用的方式。

15810856129 avatar Apr 11 '23 11:04 15810856129