llama.cpp
llama.cpp copied to clipboard
Refactor quantized processing functions
To avoid code duplication when implementing additional quantization formats (#456), refactor the forward_mul_mat
and forward_get_rows
functions to use a table of function pointers, indexed by ggml_type
.
This makes some functions non-inlined, I didn't see a regression in performance on my machine.
I tried to fix the "unused variable" warnings without complicating things too much, some are used in assert
s.
I think this is great, considering the sizes of the rows (4096 in the smallest model), it shouldn't be an issue if these functions cannot be inlined anymore.
@ggerganov did you want to look at this again or can we merge it?
Please don't merge yet - it's top priority for merging but I need some time to take a closer look