ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG/Help] `RuntimeError: Internal: [MASK] is already defined` when using the `int4` model
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
Traceback (most recent call last):
File "c:\Users\tanba\Downloads\ChatGLM-6B\web_demo.py", line 5, in <module>
tokenizer = AutoTokenizer.from_pretrained("model_int4", trust_remote_code=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 215, in __init__
self.sp_tokenizer = SPTokenizer(vocab_file)
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 35, in __init__
self.text_tokenizer = self._build_text_tokenizer(encode_special_tokens=False)
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 68, in _build_text_tokenizer
self._configure_tokenizer(
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 64, in _configure_tokenizer
text_tokenizer.refresh()
File "C:\ProgramData\Anaconda3\lib\site-packages\icetk\text_tokenizer.py", line 31, in refresh
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 68, in _build_text_tokenizer
self._configure_tokenizer(
File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 64, in _configure_tokenizer
text_tokenizer.refresh()
File "C:\ProgramData\Anaconda3\lib\site-packages\icetk\text_tokenizer.py", line 31, in refresh
self.sp.Load(model_proto=self.proto.SerializeToString())
File "C:\ProgramData\Anaconda3\lib\site-packages\sentencepiece\__init__.py", line 904, in Load
return self.LoadFromSerializedProto(model_proto) File "C:\ProgramData\Anaconda3\lib\site-packages\sentencepiece\__init__.py", line 250, in LoadFromSerializedProto
return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
RuntimeError: Internal: [MASK] is already defined.
Expected Behavior
It should not report this error.
Steps To Reproduce
Simply run python cli_demo.py or python web_demo.py.
Environment
- OS: Windows 11 22H2 22621.1555
- Python: Python 3.10.9
- Transformers: 4.27.1
- PyTorch: 2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`): True
Anything else?
No response