ChatGLM-webui 加载chatglm-6b-int4-qe会报错

加载chatglm-6b-int4-qe会报错

Open ystyle opened this issue 1 year ago • 2 comments

目录如下：

lxy52@YSTYLE-PC MINGW64 /d/Code/Python/ChatGLM-6B (main)
$ tree -d -L 2
.
|-- ChatGLM-webui
|   |-- modules
|   |-- outputs
|   `-- scripts
|-- THUDM
|   |-- chatglm-6b
|   |-- chatglm-6b-int4
|   |-- chatglm-6b-int4-qe
|   `-- chatglm-6b-main
|-- examples
|-- limitations
|-- outputs
|   |-- markdown
|   `-- save
`-- resources

15 directories

在/d/Code/Python/ChatGLM-6B/ChatGLM-webui 下运行 python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\ 会报如下错误，加载chatglm-6b-int4也会报错，是目录不能用回退的方式加载么？还是什么原因，上周某个版本就可以的，更新了后就不行了

(ChatGLM) PS D:\Code\Python\ChatGLM-6B\ChatGLM-webui> python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
GPU memory: 8.59 GB
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Traceback (most recent call last):
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 52, in <module>
    init()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 24, in init
    load_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 61, in load_model
    prepare_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 42, in prepare_model
    model = model.half().quantize(4).cuda()
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\modeling_chatglm.py", line 1281, in quantize
    load_cpu_kernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 390, in load_cpu_kernel
    cpu_kernels = CPUKernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 157, in __init__
    kernels = ctypes.cdll.LoadLibrary(kernel_file)
  File "D:\Application\Miniconda3\envs\ChatGLM\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
    self._handle = _dlopen(self._name, mode)
rnels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

Mar 26 '23 02:03 ystyle

ChatGLM-webui ChatGLM-webui copied to clipboard

加载chatglm-6b-int4-qe会报错

ChatGLM-webui
ChatGLM-webui copied to clipboard