ChatGLM2-6B
ChatGLM2-6B copied to clipboard
tokenizer无法加入special token
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
'''
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("/data/chatglm/chatglm2-6b", trust_remote_code=True)
encoded_input = tokenizer.encode('你好',add_special_tokens=False)
tokenizer.unk_token_id = 0
tokenizer.add_special_tokens({
"eos_token": "",
"bos_token": "
Expected Behavior
为什么add_special_tokens为True时,并没有加上eos_token和bos_token,求问如何解决.
Steps To Reproduce
代码如上所示
Environment
- OS:ubuntu 30.04
- Python: 3.10
- Transformers:transformers=4.30.0
- PyTorch: 2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
请问下词表的大小只有64790,为什么会出现id为64790,64792?
不支持更改自动添加的 special tokens,因为要和训练的时候保持一致