ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] `RuntimeError: Internal: [MASK] is already defined` when using the `int4` model

Open kevintsq opened this issue 2 years ago • 0 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Traceback (most recent call last):
  File "c:\Users\tanba\Downloads\ChatGLM-6B\web_demo.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained("model_int4", trust_remote_code=True)
  File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 679, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 1804, in from_pretrained
    return cls._from_pretrained(
  File "C:\ProgramData\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 1958, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 215, in __init__
    self.sp_tokenizer = SPTokenizer(vocab_file)
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 35, in __init__
    self.text_tokenizer = self._build_text_tokenizer(encode_special_tokens=False)
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 68, in _build_text_tokenizer      
    self._configure_tokenizer(
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 64, in _configure_tokenizer       
    text_tokenizer.refresh()
  File "C:\ProgramData\Anaconda3\lib\site-packages\icetk\text_tokenizer.py", line 31, in refresh
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 68, in _build_text_tokenizer      
    self._configure_tokenizer(
  File "C:\Users\tanba/.cache\huggingface\modules\transformers_modules\model\tokenization_chatglm.py", line 64, in _configure_tokenizer       
    text_tokenizer.refresh()
  File "C:\ProgramData\Anaconda3\lib\site-packages\icetk\text_tokenizer.py", line 31, in refresh
    self.sp.Load(model_proto=self.proto.SerializeToString())
  File "C:\ProgramData\Anaconda3\lib\site-packages\sentencepiece\__init__.py", line 904, in Load
    return self.LoadFromSerializedProto(model_proto)  File "C:\ProgramData\Anaconda3\lib\site-packages\sentencepiece\__init__.py", line 250, in LoadFromSerializedProto
    return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
RuntimeError: Internal: [MASK] is already defined.

Expected Behavior

It should not report this error.

Steps To Reproduce

Simply run python cli_demo.py or python web_demo.py.

Environment

- OS: Windows 11 22H2 22621.1555
- Python: Python 3.10.9
- Transformers: 4.27.1
- PyTorch: 2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`): True

Anything else?

No response

kevintsq avatar Apr 22 '23 17:04 kevintsq