max
max copied to clipboard
[Feature Request] Why we can't use Q8 quants?
What is your request?
I discovered that the only possibility to run a quantized model is to use q4 and q6 quants. Why not adding q8 quants? Seems very strange. Is there a chance to enable it?
What is your motivation for this change?
As a rule, q8 quant is the best option when you don't want the model to not losing its quality.
Any other details?
No response