chatglm.cpp
chatglm.cpp copied to clipboard
n_gpu_layers 参数?
不支持 n_gpu_layers 参数控制装载的层数吗?多实例环境对推理速度要求不太高的场合,哪怕每个实例少装载 4~5 层也能节省很多 GPU
In my case it was n-gpu-layers
instead of n_gpu_layers
which helped me to start https://github.com/oobabooga/text-generation-webui maybe this helps. I'm running the 70B 4 bit quantization.
@Tokix Thanks, but C++ is important for me 😄
确实有这个需求,我的笔记本3060就差一点点显存,运行不了q4_0的chatglm2-6B
需求强烈