llama.cpp
llama.cpp copied to clipboard
Introduce structs for the q4 data blocks
This came up with the AVX512 support in #320.
The code handling the q4 data blocks has quite a lot of pointer wrangling and sizeof
s. I understand that this may have been out of a concern for optimizability by the compiler. But I think the code could be made more readable by introducing struct
s for the q4_0 and q4_1 data blocks, without changing the overall structure too much.
I have tested my changes only on Linux and with the AVX2 and scalar implementations, and it's easy to misplace an x for a y... so beware ;-)
I have also updated utils.cpp to use the quantize_row_q4_0
/1
functions, but this isn't really clean at the moment. It also will cause a difference in the generated q4_0 model file for people using one of the SIMD-optimized variants. The model should work fine but it may be confusing for people who want to check the file's hash, so I'm open to reverting that.
q4_1 is not affected as that has no processor-specific optimizations in the quantization function.
I'm open to suggestions, also on naming and code style, which I tried to keep locally consistent. Should the block structs be made public in ggml.h?
With the latest changes #370 the quantize methods from quantize.cpp
are now moved into ggml.c
.
I think it would be a good first step to avoid the code duplication between:
-
quantize_row_q4_0()
andggml_quantize_q4_0()
-
quantize_row_q4_1()
andggml_quantize_q4_1()
I.e. make the latter call the former.
This will reduce duplicated code and also drastically improve performance of quantize.cpp
After this change has been merged, then we can think about wrapping things into struct
for better readability.
Rebased and verified identical model files and prediction output
Ok, thanks. I know @blackhole89 is working on quantization improvements on a branch - would this change be too big of a conflict for you? We can postpone it for later if necessary.
I think we should also test it on ARM that something is not broken. Will try to find time in the following days
I consider it ready now but I'm open to revising it or waiting for other changes. I agree that it should be tested on ARM and AVX512.
@ggerganov Thanks for paging me in! As far as I can see, there would be no conflict with anything I'm doing. I think this is a good change in terms of code readability - just need to make sure we don't run into weird platform-specific problems due to struct alignment or something.
@sw I changed the variable names and fixed the ARM_NEON and WASM builds We haven't tested the WASM and POWER9 paths but hopefully it works
Give it one more try after my changes to make sure that AVX is fine. I'll merge this now