ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Open yiyanxiyin opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

运行python cli_demo.py报错

root@4uot40mdrplpv-0:/yx/ChatGLM-6B# python mycli_demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "/yx/ChatGLM-6B/mycli_demo.py", line 6, in tokenizer = AutoTokenizer.from_pretrained("/yx/ChatGLM-6B/THUDM/chatglm-6b", trust_remote_code=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 205, in init self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 61, in init self.text_tokenizer = TextTokenizer(vocab_file) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 22, in init self.sp.Load(model_path) File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我是在docker中运行的, 麻烦看看是怎么回事, 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Red Hat 4.8.5-44
- Python:3.11
- Transformers:4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

yiyanxiyin avatar Apr 23 '23 07:04 yiyanxiyin

ice_text.model 文件下载不正确,可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

duzx16 avatar Apr 23 '23 07:04 duzx16

我并没有在docker中运行,但是我也遇到了这个问题。我对比了 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 中的文件,我们是一致的

zyr-NULL avatar May 05 '23 07:05 zyr-NULL

我也遇到同样的问题,模型下载的是HuggingFace的Main分支下,源码下载的也是Main分支,启动web_demo2.py时,出现“RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]”

lonly197 avatar May 05 '23 10:05 lonly197

我也是在微调代码的时候,运行train.sh文件遇到RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]这个错误,请问是怎么回事?

22zhangqian avatar May 06 '23 06:05 22zhangqian

ice_text.model 文件下载不正确,可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

我发现从这个路径下载下来的ice_text.model 的sha256 与huggingface上注明的sha256不一致。是不是后期上传的时候文件上传错误了

MRuAyAN avatar May 09 '23 09:05 MRuAyAN

前面下载的 sha256 不一致的原因是:没有使用 git-lfs

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b

Vincent-Huang-2000 avatar May 21 '23 01:05 Vincent-Huang-2000

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致; 用清华云盘上的ice_text.model替换之后就好了

wangjiaqiys avatar May 23 '23 02:05 wangjiaqiys

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致; 用清华云盘上的ice_text.model替换之后就好了

为什么git lfs 下载的,ice_text.model 会不一样,网络原因?没有下载完全吗

qq516249940 avatar Jun 13 '23 10:06 qq516249940

模型没真正克隆下来,怎么解决?

Hzzhang-nlp avatar Aug 24 '23 02:08 Hzzhang-nlp

就是克隆下来那个bin文件只有100多k

Hzzhang-nlp avatar Aug 24 '23 03:08 Hzzhang-nlp

image

Hzzhang-nlp avatar Aug 24 '23 03:08 Hzzhang-nlp

模型没真正克隆下来,怎么解决?

可以去清华大学云盘手动下,然后替掉 清华大学云盘

JimmyJIA-02 avatar Oct 12 '23 06:10 JimmyJIA-02

我是用git lfs下载的模型文件目录,然后在清华云盘上下载的模型文件,导致ice_text.model不一致; 用清华云盘上的ice_text.model替换之后就好了

替换之后还是不行,有idea是怎么回事吗

JimmyJIA-02 avatar Oct 12 '23 06:10 JimmyJIA-02