llama.cpp
llama.cpp copied to clipboard
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#10976)
Using templates and reusing the dequant_qX_Y
functions.