ChatGLM-6B RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

运行python cli_demo.py报错

root@4uot40mdrplpv-0:/yx/ChatGLM-6B# python mycli_demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "/yx/ChatGLM-6B/mycli_demo.py", line 6, in tokenizer = AutoTokenizer.from_pretrained("/yx/ChatGLM-6B/THUDM/chatglm-6b", trust_remote_code=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 205, in init self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 61, in init self.text_tokenizer = TextTokenizer(vocab_file) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 22, in init self.sp.Load(model_path) File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

我是在docker中运行的, 麻烦看看是怎么回事, 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Red Hat 4.8.5-44
- Python:3.11
- Transformers:4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

Apr 23 '23 07:04 yiyanxiyin

ice_text.model 文件下载不正确，可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

Apr 23 '23 07:04 duzx16

我并没有在docker中运行，但是我也遇到了这个问题。我对比了 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 中的文件，我们是一致的

May 05 '23 07:05 zyr-NULL

我也遇到同样的问题，模型下载的是HuggingFace的Main分支下，源码下载的也是Main分支，启动web_demo2.py时，出现“RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]”

May 05 '23 10:05 lonly197

我也是在微调代码的时候，运行train.sh文件遇到RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]这个错误，请问是怎么回事？

May 06 '23 06:05 22zhangqian

ice_text.model 文件下载不正确，可以跟 https://huggingface.co/THUDM/chatglm-6b/blob/main/ice_text.model 对比一下

我发现从这个路径下载下来的ice_text.model 的sha256 与huggingface上注明的sha256不一致。是不是后期上传的时候文件上传错误了

May 09 '23 09:05 MRuAyAN

前面下载的 sha256 不一致的原因是：没有使用 git-lfs

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b

May 21 '23 01:05 Vincent-Huang-2000

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；用清华云盘上的ice_text.model替换之后就好了

May 23 '23 02:05 wangjiaqiys

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；用清华云盘上的ice_text.model替换之后就好了

为什么git lfs 下载的，ice_text.model 会不一样，网络原因？没有下载完全吗

Jun 13 '23 10:06 qq516249940

模型没真正克隆下来，怎么解决？

Aug 24 '23 02:08 Hzzhang-nlp

就是克隆下来那个bin文件只有100多k

Aug 24 '23 03:08 Hzzhang-nlp

模型没真正克隆下来，怎么解决？

可以去清华大学云盘手动下，然后替掉清华大学云盘

Oct 12 '23 06:10 JimmyJIA-02

我是用git lfs下载的模型文件目录，然后在清华云盘上下载的模型文件，导致ice_text.model不一致；用清华云盘上的ice_text.model替换之后就好了

替换之后还是不行，有idea是怎么回事吗

Oct 12 '23 06:10 JimmyJIA-02

ChatGLM-6B ChatGLM-6B copied to clipboard

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard