llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

CLBlast: q5_0, q5_1, q8_0 dequant kernels

Open 0cc4m opened this issue 2 years ago • 0 comments

I had or still have an issue with q5_0 that I can't figure out. On Nvidia trying to transfer the quantized weights to the device leads to a CL_OUT_OF_RESOURCES error. On AMD and on POCL it leads to a segfault. It seems to have a problem with 22 byte structs, while 20 or 24 bytes are alright. I am not sure why this is the case.

As a workaround I copy the weights into a new struct and do the FP16 to FP32 conversion on CPU. This seems to have little overhead and works, but it should not be needed. If anyone knows what's up here please let me know.

I also moved the .cl file into the opencl.c as requested.

0cc4m avatar Apr 29 '23 08:04 0cc4m