llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

metal: Copy kernels for quant to F32 conversions (#10976).

Open gcp opened this issue 1 day ago • 1 comments

Modeled after the CUDA implementations.

Because of the use of type4x4 I had no idea how to reuse the existing dequantize functions, so those are repeated here in float form.

Fixes issue #10976.

gcp avatar Feb 22 '25 00:02 gcp