llama.cpp Introduce structs for the q4 data blocks

This came up with the AVX512 support in #320.

The code handling the q4 data blocks has quite a lot of pointer wrangling and sizeofs. I understand that this may have been out of a concern for optimizability by the compiler. But I think the code could be made more readable by introducing structs for the q4_0 and q4_1 data blocks, without changing the overall structure too much.

I have tested my changes only on Linux and with the AVX2 and scalar implementations, and it's easy to misplace an x for a y... so beware ;-)

I have also updated utils.cpp to use the quantize_row_q4_0/1 functions, but this isn't really clean at the moment. It also will cause a difference in the generated q4_0 model file for people using one of the SIMD-optimized variants. The model should work fine but it may be confusing for people who want to check the file's hash, so I'm open to reverting that. q4_1 is not affected as that has no processor-specific optimizations in the quantization function.

I'm open to suggestions, also on naming and code style, which I tried to keep locally consistent. Should the block structs be made public in ggml.h?

Mar 21 '23 15:03 sw

With the latest changes #370 the quantize methods from quantize.cpp are now moved into ggml.c. I think it would be a good first step to avoid the code duplication between:

quantize_row_q4_0() and ggml_quantize_q4_0()
quantize_row_q4_1() and ggml_quantize_q4_1()

I.e. make the latter call the former. This will reduce duplicated code and also drastically improve performance of quantize.cpp

After this change has been merged, then we can think about wrapping things into struct for better readability.

Mar 22 '23 05:03 ggerganov

Rebased and verified identical model files and prediction output

Mar 22 '23 20:03 sw

Ok, thanks. I know @blackhole89 is working on quantization improvements on a branch - would this change be too big of a conflict for you? We can postpone it for later if necessary.

I think we should also test it on ARM that something is not broken. Will try to find time in the following days

Mar 23 '23 17:03 ggerganov

I consider it ready now but I'm open to revising it or waiting for other changes. I agree that it should be tested on ARM and AVX512.

Mar 23 '23 18:03 sw

@ggerganov Thanks for paging me in! As far as I can see, there would be no conflict with anything I'm doing. I think this is a good change in terms of code readability - just need to make sure we don't run into weird platform-specific problems due to struct alignment or something.

Mar 23 '23 20:03 blackhole89

@sw I changed the variable names and fixed the ARM_NEON and WASM builds We haven't tested the WASM and POWER9 paths but hopefully it works

Give it one more try after my changes to make sure that AVX is fine. I'll merge this now

Mar 28 '23 15:03 ggerganov

llama.cpp llama.cpp copied to clipboard

Introduce structs for the q4 data blocks

llama.cpp
llama.cpp copied to clipboard