llama.cpp metal: Copy kernels for quant to F32 conversions (#10976).

metal: Copy kernels for quant to F32 conversions (#10976).

Open gcp opened this issue 1 day ago • 1 comments

Modeled after the CUDA implementations.

Because of the use of type4x4 I had no idea how to reuse the existing dequantize functions, so those are repeated here in float form.

Fixes issue #10976.

Feb 22 '25 00:02 gcp