feat: add GGMLFileQuantizationType and apply to test
@mishig25 that's it for #794
cc @ngxson too
FYI, I added the MOSTLY_ prefix in the last commit, to better reflect the type name from ggml (see here)
The reason is because many operations in ggml only support F32 for 1d tensors. So in fact, gguf file is never "purely" quantized, but rather being a mix between quantized type and F32.
BTW, i also propose to display the enum's key name in a tooltip inside the GGUF file viewer, like this:
i'll let you merge @ngxson!
@ngxson be careful, the const is not in ggml.h, it's in llama.h.
Yeah I linked to the incorrect file, but the content is not changed anyway because I only added MOSTLY_ on top of your commit. (So everything is still correct)