ChatGLM-6B [BUG/Help] <title>量化失败？

trafficstars

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

启动时可以正常打开网页，但是好像量化模型失败？

C:\Users\oo\langchain-ChatGLM>python webui.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c -shared -o C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so c:/mingw/bin/../lib/gcc/mingw32/6.3.0/../../../../mingw32/bin/ld.exe: cannot find -lpthread collect2.exe: error: ld returned 1 exit status Compile failed, using default cpu kernel code. Compiling gcc -O3 -fPIC -std=c99 C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels.c -shared -o C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels.so Kernels compiled : C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers No sentence-transformers model found with name M:\model\text2vec-large-chinese. Creating a new one with MEAN pooling. No sentence-transformers model found with name M:\model\text2vec-large-chinese. Creating a new one with MEAN pooling.

Expected Behavior

No response

Steps To Reproduce

gcc无法编译quantization_kernels_parallel

Environment

- OS:Windows10 64位
- Python:3.10.9
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Apr 18 '23 14:04 DDtoken

你手动运行一下 ctypes.cdll.LoadLibrary("C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels.so") 看一下会报什么错？

Apr 18 '23 14:04 duzx16

我是量化完后运行速度更慢了怎么回事。

Apr 20 '23 07:04 harleyszhang

ChatGLM-6B ChatGLM-6B copied to clipboard

[BUG/Help] <title>量化失败？

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard