whisper.cpp How to use 4 bit quantized models with main.exe?

How to use 4 bit quantized models with main.exe?

Open strfic opened this issue 2 years ago • 0 comments

How can I use the 4 bit models with the main example? My computer has less RAM and has problem loading the original Large model and thus trying to use the ggml-model-whisper-large-q4_0.bin . I want to use it with main.exe and not on a webpage (wasm).

When I try the 4 bit model with the main example, I just get following output and no transcription takes place. whisper_init_from_file_no_state: loading model from 'ggml-model-whisper-large-q4_0.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head = 20 whisper_model_load: n_audio_layer = 32 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1280 whisper_model_load: n_text_head = 20 whisper_model_load: n_text_layer = 32 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 2 whisper_model_load: type = 5 whisper_model_load: mem required = 3336.00 MB (+ 71.00 MB per decoder) whisper_model_load: adding 1608 extra tokens whisper_model_load: model ctx = 2950.97 MB

PS. Is there a link to benchmarks comparing the original models and 4 bit ones?

Mar 26 '23 07:03 strfic

whisper.cpp whisper.cpp copied to clipboard

How to use 4 bit quantized models with main.exe?

whisper.cpp
whisper.cpp copied to clipboard