huggingface.js GGUF: the file quantization type is not the GGMLQuantizationType.

There are two kinds of quantization in llama.cpp, don't confuse them:

GGMLQuantizationType(ggml_type): for tensor https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/ggml/include/ggml.h#L354
The GGUF file quantization type(general.file_type): for file https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/include/llama.h#L131

If "general.file_type" is not configured, the following algorithm is used to guess the quantization type of the file:

https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/src/llama.cpp#L3751C1-L3802C65

Jul 11 '24 07:07 snowyu

Yes. What is the precise issue in this repo?

Jul 11 '24 08:07 julien-c

@julien-c

The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType
Should guess the fileQuantType from metadata if no "general.file_type".

And some tests incorrectly use the GGMLQuantizationType as "general.file_type":

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L137

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L174

Jul 14 '24 04:07 snowyu

maybe cc @ngxson (not sure)

Jul 16 '24 16:07 julien-c

Yes you’re correct @snowyu . general.file_type is the quantization scheme (i.e MOSTLY_*). I will push a fix later (sorry I’m quite busy atm)

@julien-c FYI, it’s because quantized model usually use mixed types. For example norm layer can always stay at f32 or f16, while other tensors can be Qk. This improves model performance with a little cost of space. Hence the word « mostly » used in type name

Jul 16 '24 16:07 ngxson

@snowyu thanks for your comment !

The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType

indeed, we should create enum GGMLFileQuantizationType that looks similar to GGMLQuantizationType but slightly different. The values for GGMLFileQuantizationType should come from llama_ftype (as you've suggested in the description).

Should guess the fileQuantType from metadata if no "general.file_type".

Since this package is for parsing metadata (not a framework like llama.cpp), we should not guess anything. If the field exists, it exists. Otherwise, it does not and we do not present anything that did not exist in the file itself.

And some tests incorrectly use the GGMLQuantizationType as "general.file_type": here and here

Yep, after we add GGMLFileQuantizationType, we can use GGMLFileQuantizationType instead of GGMLQuantizationType in those tests.

The changes should be pretty straightforward. Please feel free to open a PR and tag me 🤗

Jul 16 '24 16:07 mishig25