[WIP]: Fix #286
Based on the fix for #292 here a try to fix #286 (with the caveat: tested only on CPU). The patch adds the llama.cpp tests for quantization.
One question I'm pondering is if we should get rid of the *_reference functions or add some more?
Thanks! Let's rebase on latest master so that the diff takes into account the new macros from #292
Btw, there has been recently an update to https://github.com/ggerganov/llama.cpp/pull/1237 Should synchronize with @sw
One question I'm pondering is if we should get rid of the *_reference functions or add some more?
We need them to have a deterministic way of generating models. Add more in what sense?
Thanks! Let's rebase on latest
masterso that the diff takes into account the new macros from #292
Should be done.
Btw, there has been recently an update to ggerganov/llama.cpp#1237 Should synchronize with @sw
Will look into this.
One question I'm pondering is if we should get rid of the *_reference functions or add some more?
We need them to have a deterministic way of generating models. Add more in what sense?
We have something like
.quantize_row_q = quantize_row_q4_0,
.quantize_row_q_reference = quantize_row_q4_0_reference,
but the original patch included
.quantize_row_q = ggml_fp32_to_fp16_row,
.quantize_row_q_reference = ggml_fp32_to_fp16_row,
We could define ggml_fp32_to_fp16_row_reference, too. But I also imagine that dropping '.quantize_row_q_reference' member from the struct could work?