llama.cpp
llama.cpp copied to clipboard
metal: Copy kernels for quant to F32 conversions (#10976).
Modeled after the CUDA implementations.
Because of the use of type4x4
I had no idea how to reuse the existing dequantize
functions, so those are repeated here in float
form.
Fixes issue #10976.