ChatGLM-6B 采用int4量化模型出现以下错误：AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).float() 采用int4量化模型出现以下错误：AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'

Expected Behavior

No response

Steps To Reproduce

无

Environment

- OS:windows11
- Python:3.8
- Transformers:-
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

Mar 24 '23 05:03 fryng

检查一下是否 python 是 64位，gcc 编译 32 位 .so

Mar 24 '23 10:03 YufengSoft

碰到同样的问题，都是64位的

Mar 31 '23 08:03 dinfer

请问是不是CPU kernel加载失败了？可以提供一下完整的输出？

Apr 03 '23 06:04 songxxzp

output = W8A16LinearCPU.apply(input, self.weight, self.weight_scale, self.weight_bit_width, self.quantization_cache)

File "C:\Users\cm/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 76, in forward weight = extract_weight_to_float(quant_w, scale_w, weight_bit_width, quantization_cache=quantization_cache)

File "C:\Users\cm/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 260, in extract_weight_to_float func = cpu_kernels.int4WeightExtractionFloat

AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'

Apr 14 '23 02:04 hellonlp

环境不同，但是报错一致，环境如下：

Environment

OS:MacOS 13.0
Python:3.8
Transformers:-
MPS Support (python -c "import torch; print(torch.backends.mps.is_available())") :True

采用CPU的方式+int4量化模型可以正常运行 model = AutoModel.from_pretrained("local path", trust_remote_code=True).float() 采用MPS的方式+int4量化模型 model = AutoModel.from_pretrained("local path", trust_remote_code=True).half().to('mps') 报错如下：

Traceback (most recent call last):
  File "/PycharmProjects/ChatGLM-6B/cli_demo.py", line 58, in <module>
    main()
  File "/PycharmProjects/ChatGLM-6B/cli_demo.py", line 43, in main
    for response, history in model.stream_chat(tokenizer, query, history=history):
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1312, in stream_chat
    for outputs in self.stream_generate(**inputs, **gen_kwargs):
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1389, in stream_generate
    outputs = self(
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1191, in forward
    transformer_outputs = self.transformer(
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 997, in forward
    layer_ret = layer(
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 627, in forward
    attention_outputs = self.attention(
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 445, in forward
    mixed_raw_layer = self.query_key_value(hidden_states)
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 375, in forward
    output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
  File "/miniconda3/envs/ChatGLM-6B/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 53, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
  File "/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 262, in extract_weight_to_half
    func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'

Apr 20 '23 09:04 ioiogoo

model = AutoModel.from_pretrained("localpath", trust_remote_code=True).float().to('mps')

func = kernels.int4WeightExtractionHalf

AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'

Apr 29 '23 13:04 lee528066

pip install cpm_kernels

May 07 '23 03:05 dereksjtu

pip install cpm_kernels

That worked. By the way, what's the meaning of cpm?

May 30 '23 06:05 JungleG

pip install cpm_kernels Requirement already satisfied: cpm_kernels in d:\anaconda3\envs\chatglm\lib\site-packages (1.0.11)

Jun 07 '23 18:06 duyanke888

不行啊，还是报同样的错误。

Jun 28 '23 07:06 IamHimon

你好，请问你解决了吗？

Jun 28 '23 08:06 IamHimon

量化后的模型只支持cuda吧。第一版chatglm是这样的

Jul 05 '23 07:07 yichengming

您好，遇到了同样的问题，问题解决了么？

Aug 10 '23 00:08 littleyanglovegithub

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'D:\\software\\Graphviz\\bin\\dot.exe'
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���
Compile parallel cpu kernel gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so failed.
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���
Compile cpu kernel gcc -O3 -fPIC -std=c99 C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\cc\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so failed.
Traceback (most recent call last):
  File "E:\data\Projects\chatglm2-6b\main.py", line 11, in <module>
    response, history = model.chat(tokenizer, "你好", history=[])
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1028, in chat
    outputs = self.generate(**inputs, **gen_kwargs)
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\transformers\generation\utils.py", line 1437, in generate
    return self.sample(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\transformers\generation\utils.py", line 2443, in sample
    outputs = self(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 932, in forward
    transformer_outputs = self.transformer(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 828, in forward
    hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 638, in forward
    layer_ret = layer(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 542, in forward
    attention_output, kv_cache = self.self_attention(
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 374, in forward
    mixed_x_layer = self.query_key_value(hidden_states)
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 502, in forward
    output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
  File "D:\software\anaconda3\envs\chatglm2-6b\lib\site-packages\torch\autograd\function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 75, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
  File "C:\Users\cc/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 287, in extract_weight_to_half
    func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'

我也遇到这个问题了，pip install cpm_kernels也装过了，到那个路径下找'D:\software\Graphviz\bin\dot.exe'也有这个文件，楼主咋解决的

Sep 22 '23 01:09 fskz

同样出现了这个问题使用的chatglm2 6b int4 https://github.com/chatchat-space/Langchain-Chatchat/issues/1995

Nov 08 '23 13:11 BIM4SmartHydropower

我解决了这个问题。使用的是chatglm6b-int4，主要是发现cpm_kernels有一个函数会遍历所有环境变量，但它把所有环境变量的路径都当成目录，导致里面有个os.listdir的操作造成报错，但又没进行错误处理，于是一直往外传，导致了kernel读取失败。

解决方案：把cpm_kernels里的library的base.py文件中的lookup_dll函数改为以下可以解决

def lookup_dll(prefix): paths = os.environ.get("PATH", "").split(os.pathsep) for path in paths: if not os.path.exists(path): continue try: for name in os.listdir(path): if name.startswith(prefix) and name.lower().endswith(".dll"): return os.path.join(path, name) except Exception as e: print(e) return None

Nov 18 '23 14:11 usamimeri

我解决了这个问题。使用的是chatglm6b-int4，主要是发现cpm_kernels有一个函数会遍历所有环境变量，但它把所有环境变量的路径都当成目录，导致里面有个os.listdir的操作造成报错，但又没进行错误处理，于是一直往外传，导致了kernel读取失败。

解决方案：把cpm_kernels里的library的base.py文件中的lookup_dll函数改为以下可以解决

def lookup_dll(prefix): paths = os.environ.get("PATH", "").split(os.pathsep) for path in paths: if not os.path.exists(path): continue try: for name in os.listdir(path): if name.startswith(prefix) and name.lower().endswith(".dll"): return os.path.join(path, name) except Exception as e: print(e) return None

我修改了该文件，还是没解决这个问题

Mar 24 '24 17:03 340090738

ChatGLM-6B ChatGLM-6B copied to clipboard

采用int4量化模型出现以下错误：AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionFloat'

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Environment

ChatGLM-6B
ChatGLM-6B copied to clipboard