Quantizing (my way).

Open 0wwafa opened this issue 1 year ago • 0 comments

Hello @ggerganov ! I wish to quantize: openai/whisper-large-v3 in my "usual way". With llama.cpp I usally do:

llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k

And I use convert-hf-gguf to do the conversion from the safetensors to f16.

How can I do the same with whisper-large-v3?

Jun 22 '24 01:06 0wwafa