ChatGLM-6B [BUG/Help] <title>Windows环境下使用GPU加载INT-4模型报错

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

No compiled kernel found. Compiling kernels : C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.so e:/mingw/bin/../lib/gcc/mingw32/6.3.0/../../../../mingw32/bin/ld.exe: cannot find -lpthread collect2.exe: error: ld returned 1 exit status Compile default cpu kernel failed, using default cpu kernel code. Compiling gcc -O3 -fPIC -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels.so Load default cpu kernel failed: Traceback (most recent call last): File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\chatglm\quantization.py", line 167, in init kernels = ctypes.cdll.LoadLibrary(kernel_file) File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init_.py", line 452, in LoadLibrary return self.dlltype(name) File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: [WinError 193] %1 不是有效的 Win32 应用程序。

Failed to load kernel. Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers No compiled kernel found. Compiling kernels : C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels_parallel.so e:/mingw/bin/../lib/gcc/mingw32/6.3.0/../../../../mingw32/bin/ld.exe: cannot find -lpthread collect2.exe: error: ld returned 1 exit status Compile default cpu kernel failed, using default cpu kernel code. Compiling gcc -O3 -fPIC -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\chatglm\quantization_kernels.so Load default cpu kernel failed: Traceback (most recent call last): File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\chatglm\quantization.py", line 167, in init kernels = ctypes.cdll.LoadLibrary(kernel_file) File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init_.py", line 452, in LoadLibrary return self.dlltype(name) File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: [WinError 193] %1 不是有效的 Win32 应用程序。

Failed to load kernel. Traceback (most recent call last): File "", line 1, in File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\chatglm\modeling_chatglm.py", line 1430, in quantize load_cpu_kernel(**kwargs) File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\chatglm\quantization.py", line 430, in load_cpu_kernel assert cpu_kernels.load AssertionError

Expected Behavior

No response

Steps To Reproduce

Windows环境下使用GPU加载INT-4模型，前面顺利，使用MingW下载的gcc，无法编译quantization_kernels_parallel

Environment

- OS:Windows10
- Python:3.10.10
- Transformers:4.27.1
- PyTorch:2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

Apr 18 '23 08:04 OathK1per

你是不是windows10 32bit的系统？

Apr 18 '23 10:04 YIZXIY

可以换一种部署方式WSL Windows部署文档

Apr 19 '23 09:04 ZhangErling

同样遇到这个问题，卡了很久，请问题主是使用了lpthread解决的么？我下载了lpthread并更改了系统变量后还是不行，如果题主能指点一下，将不甚感激 Environment

- OS:Windows11
- Python:3.9.0
- Transformers:4.26.1
- PyTorch:2.0.1+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
- ```

Jun 16 '23 18:06 AllenXiao95

ChatGLM-6B ChatGLM-6B copied to clipboard

[BUG/Help] <title>Windows环境下使用GPU加载INT-4模型报错

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard