Linly icon indicating copy to clipboard operation
Linly copied to clipboard

运行脚本generate_chatllama.py后,tokenizer报错

Open tianmala opened this issue 2 years ago • 10 comments

Traceback (most recent call last): File "scripts/generate_chatllama.py", line 82, in args.tokenizer = str2tokenizerargs.tokenizer File "/home/mo/llama/TencentPretrain/tencentpretrain/utils/tokenizers.py", line 255, in init super().init(args, is_src) File "/home/mo/llama/TencentPretrain/tencentpretrain/utils/tokenizers.py", line 30, in init self.sp_model.Load(spm_model_path) File "/home/mo/miniconda3/envs/llm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) File "/home/mo/miniconda3/envs/llm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我运行脚本后报错了,请问这个问题有谁遇到过嘛

tianmala avatar Mar 31 '23 08:03 tianmala

我也是求教

davikl avatar Apr 04 '23 17:04 davikl

同样出错

rayguo01 avatar Apr 06 '23 13:04 rayguo01

subscribe this issue as meet the same issue

guanlinz avatar Apr 07 '23 08:04 guanlinz

同样问题,怎么解决

lylcst avatar Apr 08 '23 07:04 lylcst

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

2775919186 avatar Apr 12 '23 02:04 2775919186

同样出错

Data2Me avatar Apr 12 '23 11:04 Data2Me

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

ydli-ai avatar Apr 12 '23 12:04 ydli-ai

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错: File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Data2Me avatar Apr 12 '23 12:04 Data2Me

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错: File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

已解决,重新下载模型权重文件。git clone时要安装git lfs

Data2Me avatar Apr 13 '23 02:04 Data2Me

spm_model_file = '../ChatLLaMA-zh-7B/tokenizer.model'这个分词模型是不是损坏了?

我测试了没有遇到这个问题,检查一下Sentencepiece版本? 我这里是0.1.97

我这边Sentencepiece版本也是0.1.97,刚试了还是报错: File "/opt/conda/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

已解决,重新下载模型权重文件。git clone时要安装git lfs

安装之后下载模型权重文件速度太慢了,有什么好方法吗?

YYForReal avatar May 04 '23 02:05 YYForReal