[BUG/Help] TypeError reported when model.chat() called
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
To run chatglm-6b-int4 on my local machine, I use example code as follow _from transformers import AutoTokenizer, AutoModel import torch
modelname = "D:\\models\\zhipu\\chatglm-6b-int4"
tokenizer = AutoTokenizer.from_pretrained(modelname, trust_remote_code=True)
model = AutoModel.from_pretrained(modelname, trust_remote_code=True).float()
model = model.quantize(bits=4, kernel_file="D:\\models\\zhipu\\chatglm-6b-int4\\quantization_kernels_parallel.so")
message = '你好'
response, history = model.chat(tokenizer, message, history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)_
However, I got TypeError: expected Tensor as element 0 in argument 0, but got tuple when I run this code, please see error messsage in attach file[ Error Message.txt ](url)
I traced code, found the error reported by line 254 in modeling_chatglm.py. if layer_past is not None: past_key, past_value = layer_past[0], layer_past[1] if (type(layer_past) != str): key_layer = torch.cat((past_key, key_layer), dim=0) value_layer = torch.cat((past_value, value_layer), dim=0)
for some unknow reasons, layer_past was set as string 'past_key_values'
So past_key='p' and past_value='a', which caused argument mismatch error in torch.cat
More information for files update
- In order to make code runing on CPU, I changed line 161 in quantization.py from #kernels = ctypes.cdll.LoadLibrary(kernel_file) to kernels = ctypes.CDLL(kernel_file,winmode=0)
- In order to avoid sp_tokenizer is not defined error, I move code self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) from 205 to 182 in tokenization_chatglm.py
Expected Behavior
No error in model.chat call
Steps To Reproduce
Please see Behavior section
Environment
- OS:Windows 11 家庭中文版
- Python: Python 3.12.7
- Transformers:Version: 4.42.0
- PyTorch:Version: 2.5.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : False
Anything else?
No response