whispercpp.py
whispercpp.py copied to clipboard
Quantization model support?
Thanks for writing this. Seems to work well.
However, is there a way to use quantized models with this? I see they aren't included in your MODELS dictionary.
If I hot-patch it like:
from whispercpp import Whisper, MODELS
MODELS['ggml-large-q5_0.bin'] = 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-q5_0.bin'
w = Whisper('large-q5_0')
that makes it download the model, but when it tries to load the model, it fails with:
Downloading ggml-large-q5_0.bin...
whisper_init_from_file_no_state: loading model from '/home/chris/.ggml-models/ggml-large-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1008
whisper_model_load: type = 5
whisper_model_load: mem required = 3342.00 MB (+ 71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 2950.97 MB
��isper_model_load: unknown tensor '0y��q
K.7��eţ�k�ؠ�� �͠[email protected]�D��^#��9]�|���N(f�fm����:�@lc��QwO�oezg{��-!�Ě���' in model file
whisper_init_no_state: failed to load model