whisper.cpp
whisper.cpp copied to clipboard
Quantizing (my way).
Hello @ggerganov ! I wish to quantize: openai/whisper-large-v3 in my "usual way". With llama.cpp I usally do:
llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
And I use convert-hf-gguf to do the conversion from the safetensors to f16.
How can I do the same with whisper-large-v3?