FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config objec

Open dinchu opened this issue 2 years ago • 10 comments

when trying to load quantized models i always get

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config objec

dinchu avatar Sep 21 '23 18:09 dinchu