ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG/Help] <在Linux环境安装CPU版本(int4版本)时:Compile default cpu kernel failed>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
在centos环境下,运行cli_demo.py时,会报 Compile default cpu kernel failed错误,经检查已经安装gcc和openmp了,所以该如何解决这个问题?
检查gcc:
gcc -v
使用内建 specs。
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
目标:x86_64-redhat-linux
配置为:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
线程模型:posix
gcc 版本 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
openmp检查
rpm -qa | grep libgomp
libgomp-4.8.5-44.el7.x86_64
报错日志如下:
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/model/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/model/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/model/quantization_kernels_parallel.so
Compile default cpu kernel failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/model/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/model/quantization_kernels.so
Compile default cpu kernel failed.
Failed to load kernel.
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
欢迎使用 ChatGLM-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序
用户:nihao
Traceback (most recent call last):
File "/home/manager/ChatGLM-6B/cli_demo.py", line 58, in <module>
main()
File "/home/manager/ChatGLM-6B/cli_demo.py", line 43, in main
for response, history in model.stream_chat(tokenizer, query, history=history):
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 1311, in stream_chat
for outputs in self.stream_generate(**inputs, **gen_kwargs):
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 1388, in stream_generate
outputs = self(
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 1190, in forward
transformer_outputs = self.transformer(
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 996, in forward
layer_ret = layer(
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 627, in forward
attention_outputs = self.attention(
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 445, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/quantization.py", line 388, in forward
output = W8A16LinearCPU.apply(input, self.weight, self.weight_scale, self.weight_bit_width,
File "/root/miniconda3/envs/chatglm/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/.cache/huggingface/modules/transformers_modules/model/quantization.py", line 80, in forward
weight = extract_weight_to_float(quant_w, scale_w, weight_bit_width, quantization_cache=quantization_cache)
File "/root/.cache/huggingface/modules/transformers_modules/model/quantization.py", line 316, in extract_weight_to_float
func(
TypeError: 'NoneType' object is not callable
看到一个帖子上说,去看看quantization_kernels_parallel.so 文件是否存在,我去查了一下,发现文件确实不存在,所以是自动编译这个文件的问题?那该如何解决呢?
Expected Behavior
可以正常运行demo
Steps To Reproduce
- 首先在根目录创建一个model文件夹,在huggingface页面(https://huggingface.co/THUDM/chatglm-6b-int4/tree/main)进行模型的下载,然后把所有的文件保存到model文件夹下。
- 然后创建conda环境,并安装依赖
conda create -n chatglm python=3.9
conda activate chatglm
pip install -r requirements.txt
- 修改cli_demo.py的代码,改下模型路径(model)和CPU部署(.float())。
tokenizer = AutoTokenizer.from_pretrained("model", trust_remote_code=True)
model = AutoModel.from_pretrained("model", trust_remote_code=True).float()
- 运行cli_demo.py,则会报错:
python cli_demo.py
Environment
- OS: Linux version 3.10.0-957.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Nov 8 23:39:32 UTC 2018
- Python: 3.9.16
- Transformers:4.27.1
- PyTorch: 2.0.1
- CUDA Support :False
Anything else?
有可能是gcc和openmp的版本不对??
遇到了同样的问题
how to solve that problem? can anyone give some advice?
遇到了同样的问题 how to solve that problem? can anyone give some advice?
我已经解决了这个问题,原因就是gcc的编译问题,需要手动编译这两个文件就可以了。
- 手动编译两个文件:quantization_kernels.c 和 quantization_kernels_parallel.c
gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels.c -shared -o quantization_kernels.so
gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels_parallel.c -shared -o quantization_kernels_parallel.so
- 然后在代码中,再添加一行代码:
model = model.quantize(bits=4, kernel_file="model/quantization_kernels.so")
这样子就可以成功运行了