ggml
ggml copied to clipboard
GGUF quantization meta-data format
Hello!
Are there some resources that explain how the quantized parameters are structured in a GGUF file? We are interested in porting HQQ-quantized models into GGUF format, but in order to do that, we need to know exactly how it is stored. We basically need to know:
- The bitpacking logic
- axis along which quantization is done
- group-sizes associated with different quant types
Thanks!
Hi, you would better have a look at llama.cpp :
https://github.com/ggerganov/llama.cpp/blob/f184dd920852d6d372b754f871ee06cfe6f977ad/llama.cpp#L13599
@mobicham here is the spec for GGUF for you to use: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md