whispercpp.py Quantization model support?

Quantization model support?

Open chrisspen opened this issue 1 year ago • 0 comments

Thanks for writing this. Seems to work well.

However, is there a way to use quantized models with this? I see they aren't included in your MODELS dictionary.

If I hot-patch it like:

from whispercpp import Whisper, MODELS
MODELS['ggml-large-q5_0.bin'] = 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-q5_0.bin'
w = Whisper('large-q5_0')

that makes it download the model, but when it tries to load the model, it fails with:

Downloading ggml-large-q5_0.bin...
whisper_init_from_file_no_state: loading model from '/home/chris/.ggml-models/ggml-large-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1008
whisper_model_load: type          = 5
whisper_model_load: mem required  = 3342.00 MB (+   71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2950.97 MB
��isper_model_load: unknown tensor '0y��q
  K.7��eţ�k�ؠ��	�͠[email protected]�D��^#��9]�|���N(f�fm����:�@lc��QwO�oezg{��-!�Ě���' in model file
whisper_init_no_state: failed to load model

Nov 05 '23 21:11 chrisspen

whispercpp.py whispercpp.py copied to clipboard

Quantization model support?

whispercpp.py
whispercpp.py copied to clipboard