llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Refactor quantized processing functions

Open sw opened this issue 1 year ago • 1 comments

To avoid code duplication when implementing additional quantization formats (#456), refactor the forward_mul_mat and forward_get_rows functions to use a table of function pointers, indexed by ggml_type.

This makes some functions non-inlined, I didn't see a regression in performance on my machine.

I tried to fix the "unused variable" warnings without complicating things too much, some are used in asserts.

sw avatar Mar 25 '23 20:03 sw

I think this is great, considering the sizes of the rows (4096 in the smallest model), it shouldn't be an issue if these functions cannot be inlined anymore.

slaren avatar Mar 25 '23 21:03 slaren

@ggerganov did you want to look at this again or can we merge it?

sw avatar Mar 27 '23 16:03 sw

Please don't merge yet - it's top priority for merging but I need some time to take a closer look

ggerganov avatar Mar 27 '23 16:03 ggerganov