ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] linux下chatglm-6b-int4模型无法用GPU加载

Open pollymars opened this issue 1 year ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

用GPU加载chatglm-6b-int4模型,kernel编译失败: >>> from transformers import AutoTokenizer, AutoModel >>> model = AutoModel.from_pretrained("./chatglm-6b-int4", trust_remote_code=True).half().cuda() Explicitly passing a ‘revision’ is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a ‘revision’ is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c Compiling gcc -O3 -pthread -fopenmp -std=c99 /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c -shared -o /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.so /usr/bin/ld: /tmp/ccjNdJf6.o: relocation R_x86_64_32 against '. text ' can not be used when making a shared object; recompile with - fPIC /tmp/ccjNdJf6.o: error adding symbols: Bad value
collect2: error: ld returned 1 exit status Compile failed , using default cpu kernel code. Compiling gcc -O3 -fPIC -std=c99 /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.c -shared -o /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.so Kernels compiled : /home/pollymars/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers

调用模型则返回: RuntimeError: CUDA Error: no kernel image is available for execution on the device

尝试手动编译并指定kernel,未能解决,还会再走一遍上述编译流程,然后报上述一样的CUDA Error: gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels_parallel.c -shared -o quantization_kernels_parallel.so model = AutoModel.from_pretrained("./chatglm-6b-int4",trust_remote_code=True).half().cuda() model = model.quantize(bits=4, kernel_file="/home/pollymars/ChatGLM-6B-main/chatglm-6b-int4/quantization_kernels_parallel.so")

用CPU加载chatglm-6b-int4模型,手动编译并指定kernel则可以成功运行模型,但运算速度慢。

Expected Behavior

No response

Steps To Reproduce

linux下加载chatglm-6b-int4模型,GPU kernel编译失败,手动编译并指定kernel也未解决。

Environment

- OS: Ubuntu 5.4.0-6ubentul~16.04.9
- Python: 3.8.5
- Transformers: 4.26.1
- PyTorch: 1.13.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True
- gcc: 5.4.0 20160609
- openmp: 201307

Anything else?

No response

pollymars avatar Mar 22 '23 10:03 pollymars

``请问"./chatglm-6b-int4"中是最新的代码吗? CUDA Error和CPU kernel无关,可能是cpm_kernels的问题?在最新的代码中,如果加载cpm_kernels失败,会有如下输出: Failed to load cpm_kernels: 此外也有可能是CUDA的问题: 可以简单尝试:

>>>test=torch.Tensor([1,2,3,4])
>>>test=test.cuda()
>>>test

如果有问题,请问您的显卡是?

songxxzp avatar Mar 22 '23 11:03 songxxzp

"./chatglm-6b-int4"中就是huggingface上面对应的代码和模型文件。 没有看到关于cpm_kernels的报错。 显卡是A100,我测试下。 有没有可能是gcc和openmp的版本问题?谢谢!

pollymars avatar Mar 22 '23 12:03 pollymars

感觉应该不是gcc和openmp的问题。

RuntimeError: CUDA Error: no kernel image is available for execution on the device

这个错误感觉是CUDA不能使用(虽然torch.cuda.is_available() == True,但也有可能torch和cuda版本不匹配导致不能正确运行)。

>>>test=torch.Tensor([1,2,3,4])
>>>test=test.cuda()
>>>test

这段代码可以成功运行吗?测一下CUDA能不能用。

songxxzp avatar Mar 22 '23 13:03 songxxzp

可以成功运行的。 我更新了"./chatglm-6b-int4"中的代码,现在kernel编译成功了,不过模型推理时还是报“RuntimeError: CUDA Error: no kernel image is available for execution on the device”。 报错信息显示,是在quantization.py的235行,即调用kernels.int4WeightExtractionHalf的时候,进到cpm_kernels/library/cuda.py,checkCUStatus(cuda.cuModuleLoadData(ctypes.byref(module), data))这一行报错。

可能和issue #119 的问题类似。

pollymars avatar Mar 23 '23 02:03 pollymars

这里说需要算力6.1及以上,太坑了吧 https://github.com/OpenBMB/BMInf/issues/29#issuecomment-1000951808

mozhuanzuojing avatar May 27 '23 11:05 mozhuanzuojing

Duplicate of #119

zhangch9 avatar Aug 16 '23 05:08 zhangch9