ChatGLM-6B
ChatGLM-6B copied to clipboard
采用int4量化模型出现以下错误:AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).float() 采用int4量化模型出现以下错误:AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'
Expected Behavior
No response
Steps To Reproduce
无
Environment
- OS:windows11
- Python:3.8
- Transformers:-
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False
Anything else?
No response
检查一下是否 python 是 64位,gcc 编译 32 位 .so
碰到同样的问题,都是64位的
请问是不是CPU kernel加载失败了?可以提供一下完整的输出?
output = W8A16LinearCPU.apply(input, self.weight, self.weight_scale, self.weight_bit_width, self.quantization_cache)
File "C:\Users\cm/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 76, in forward weight = extract_weight_to_float(quant_w, scale_w, weight_bit_width, quantization_cache=quantization_cache)
File "C:\Users\cm/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 260, in extract_weight_to_float func = cpu_kernels.int4WeightExtractionFloat
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'
环境不同,但是报错一致,环境如下:
Environment
- OS:MacOS 13.0
- Python:3.8
- Transformers:-
- MPS Support (
python -c "import torch; print(torch.backends.mps.is_available())"
) :True
采用CPU的方式+int4量化模型可以正常运行
model = AutoModel.from_pretrained("local path", trust_remote_code=True).float()
采用MPS的方式+int4量化模型
model = AutoModel.from_pretrained("local path", trust_remote_code=True).half().to('mps')
报错如下:
Traceback (most recent call last):
File "/PycharmProjects/ChatGLM-6B/cli_demo.py", line 58, in <module>
main()
File "/PycharmProjects/ChatGLM-6B/cli_demo.py", line 43, in main
for response, history in model.stream_chat(tokenizer, query, history=history):
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1312, in stream_chat
for outputs in self.stream_generate(**inputs, **gen_kwargs):
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1389, in stream_generate
outputs = self(
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1191, in forward
transformer_outputs = self.transformer(
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 997, in forward
layer_ret = layer(
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 627, in forward
attention_outputs = self.attention(
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 445, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 375, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 53, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 262, in extract_weight_to_half
func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
model = AutoModel.from_pretrained("localpath", trust_remote_code=True).float().to('mps')
func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
pip install cpm_kernels
pip install cpm_kernels
That worked. By the way, what's the meaning of cpm?
pip install cpm_kernels Requirement already satisfied: cpm_kernels in d:\anaconda3\envs\chatglm\lib\site-packages (1.0.11)
不行啊,还是报同样的错误。
你好,请问你解决了吗?
量化后的模型只支持cuda吧。 第一版chatglm是这样的
您好,遇到了同样的问题,问题解决了么?
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'D:\\software\\Graphviz\\bin\\dot.exe'
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Compile parallel cpu kernel gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so failed.
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Compile cpu kernel gcc -O3 -fPIC -std=c99 C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so failed.
Traceback (most recent call last):
File "E:\data\Projects\chatglm2-6b\main.py", line 11, in <module>
response, history = model.chat(tokenizer, "你好", history=[])
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1028, in chat
outputs = self.generate(**inputs, **gen_kwargs)
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\transformers\generation\utils.py", line 1437, in generate
return self.sample(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\transformers\generation\utils.py", line 2443, in sample
outputs = self(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 932, in forward
transformer_outputs = self.transformer(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 828, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 638, in forward
layer_ret = layer(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 542, in forward
attention_output, kv_cache = self.self_attention(
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 374, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 502, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\autograd\function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 75, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 287, in extract_weight_to_half
func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
我也遇到这个问题了,pip install cpm_kernels也装过了,到那个路径下找'D:\software\Graphviz\bin\dot.exe'也有这个文件,楼主咋解决的
同样出现了这个问题 使用的chatglm2 6b int4 https://github.com/chatchat-space/Langchain-Chatchat/issues/1995
我解决了这个问题。使用的是chatglm6b-int4,主要是发现cpm_kernels有一个函数会遍历所有环境变量,但它把所有环境变量的路径都当成目录,导致里面有个os.listdir的操作造成报错,但又没进行错误处理,于是一直往外传,导致了kernel读取失败。
解决方案: 把cpm_kernels里的library的base.py文件中的lookup_dll函数改为以下可以解决
def lookup_dll(prefix): paths = os.environ.get("PATH", "").split(os.pathsep) for path in paths: if not os.path.exists(path): continue try: for name in os.listdir(path): if name.startswith(prefix) and name.lower().endswith(".dll"): return os.path.join(path, name) except Exception as e: print(e) return None
我解决了这个问题。使用的是chatglm6b-int4,主要是发现cpm_kernels有一个函数会遍历所有环境变量,但它把所有环境变量的路径都当成目录,导致里面有个os.listdir的操作造成报错,但又没进行错误处理,于是一直往外传,导致了kernel读取失败。
解决方案: 把cpm_kernels里的library的base.py文件中的lookup_dll函数改为以下可以解决
def lookup_dll(prefix): paths = os.environ.get("PATH", "").split(os.pathsep) for path in paths: if not os.path.exists(path): continue try: for name in os.listdir(path): if name.startswith(prefix) and name.lower().endswith(".dll"): return os.path.join(path, name) except Exception as e: print(e) return None
我修改了该文件,还是没解决这个问题