[feature request] conversion to gguf in a more pure form.
Hello, usually when quantizing I first convert a huggingface model to F16 gguf then I quantize that to my quantizations. I have noticed that convert does not produce a "pure" f16. I think there should be a flag as in the quantize program to allow a pure f16 (all tensors) or pure bf16 conversion.
I have noticed that convert does not produce a "pure" f16.
Do you mean that some tensors are in F32 in the resulting gguf model? These are usually 1D tensors which are very small anyway. (BTW, even llama-quantize --pure ... keeps 1D tensors as F32)
Some of the ggml operators used on 1D tensors (currently) only work on F32 tensors (e.g. ggml_norm), so a pure f16 gguf model would not work without modifications in ggml.c.
Is there a particular reason why you'd like extremely "pure" conversions?
Is there a particular reason why you'd like extremely "pure" conversions?
well. no.. I mean I wanted to make comparisons between f16 "pure" and my own quants (which are a mix of f16 and q5 or q6). They seem to be smaller at no cost.. almost no degradation. You can find those quants in my huggingface profile page under models: https://huggingface.co/ZeroWw
This issue was closed because it has been inactive for 14 days since being marked as stale.