GGUF: the file quantization type is not the GGMLQuantizationType.
There are two kinds of quantization in llama.cpp, don't confuse them:
- GGMLQuantizationType(
ggml_type): for tensor https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/ggml/include/ggml.h#L354 - The GGUF file quantization type(
general.file_type): for file https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/include/llama.h#L131
If "general.file_type" is not configured, the following algorithm is used to guess the quantization type of the file:
https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/src/llama.cpp#L3751C1-L3802C65
Yes. What is the precise issue in this repo?
@julien-c
- The "general.file_type" missing enum FileQuantizationType like
GGMLQuantizationType - Should guess the fileQuantType from metadata if no "general.file_type".
And some tests incorrectly use the GGMLQuantizationType as "general.file_type":
https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L137
https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L174
maybe cc @ngxson (not sure)
Yes you’re correct @snowyu . general.file_type is the quantization scheme (i.e MOSTLY_*). I will push a fix later (sorry I’m quite busy atm)
@julien-c FYI, it’s because quantized model usually use mixed types. For example norm layer can always stay at f32 or f16, while other tensors can be Qk. This improves model performance with a little cost of space. Hence the word « mostly » used in type name
@snowyu thanks for your comment !
The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType
indeed, we should create enum GGMLFileQuantizationType that looks similar to GGMLQuantizationType but slightly different. The values for GGMLFileQuantizationType should come from llama_ftype (as you've suggested in the description).
Should guess the fileQuantType from metadata if no "general.file_type".
Since this package is for parsing metadata (not a framework like llama.cpp), we should not guess anything. If the field exists, it exists. Otherwise, it does not and we do not present anything that did not exist in the file itself.
And some tests incorrectly use the GGMLQuantizationType as "general.file_type": here and here
Yep, after we add GGMLFileQuantizationType, we can use GGMLFileQuantizationType instead of GGMLQuantizationType in those tests.
The changes should be pretty straightforward. Please feel free to open a PR and tag me 🤗