llama.cpp Refactor quantized processing functions

Refactor quantized processing functions

Open sw opened this issue 1 year ago • 1 comments

To avoid code duplication when implementing additional quantization formats (#456), refactor the forward_mul_mat and forward_get_rows functions to use a table of function pointers, indexed by ggml_type.

This makes some functions non-inlined, I didn't see a regression in performance on my machine.

I tried to fix the "unused variable" warnings without complicating things too much, some are used in asserts.

Mar 25 '23 20:03 sw

I think this is great, considering the sizes of the rows (4096 in the smallest model), it shouldn't be an issue if these functions cannot be inlined anymore.

Mar 25 '23 21:03 slaren

@ggerganov did you want to look at this again or can we merge it?

Mar 27 '23 16:03 sw

Please don't merge yet - it's top priority for merging but I need some time to take a closer look

Mar 27 '23 16:03 ggerganov

llama.cpp llama.cpp copied to clipboard

Refactor quantized processing functions

llama.cpp
llama.cpp copied to clipboard