whisper.cpp [BUG] When canceling --no-gpu model loading becomes abnormally slow.

[BUG] When canceling --no-gpu model loading becomes abnormally slow.

Open aoom opened this issue 6 months ago • 0 comments

~/whisper.cpp-master$ ./main -m qqml-tiny-q5_1.bin -f ./samples/jfk.wav -pc --no-gpu whisper_init_from_file_with_params_no_state: loading model from 'qqml-tiny-q5_1.bin' whisper_init_with_params_no_state: use gpu = 0 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 384 whisper_model_load: n_audio_head = 6 whisper_model_load: n_audio_layer = 4 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 384 whisper_model_load: n_text_head = 6 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 1 whisper_model_load: type = 1 (tiny) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: CPU total size = 31.57 MB whisper_model_load: model size = 31.57 MB whisper_init_state: kv self size = 9.44 MB whisper_init_state: kv cross size = 9.44 MB whisper_init_state: kv pad size = 2.36 MB whisper_init_state: compute buffer (conv) = 13.19 MB whisper_init_state: compute buffer (encode) = 85.53 MB whisper_init_state: compute buffer (cross) = 3.88 MB whisper_init_state: compute buffer (decode) = 95.89 MB

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:10.560] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: load time = 110.93 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 47.28 ms whisper_print_timings: sample time = 131.60 ms / 131 runs ( 1.00 ms per run) whisper_print_timings: encode time = 2595.64 ms / 1 runs ( 2595.64 ms per run) whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: batchd time = 609.37 ms / 129 runs ( 4.72 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 3531.42 ms ~/whisper.cpp-master$ ./main -m qqml-tiny-q5_1.bin -f ./samples/jfk.wav -pc whisper_init_from_file_with_params_no_state: loading model from 'qqml-tiny-q5_1.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 384 whisper_model_load: n_audio_head = 6 whisper_model_load: n_audio_layer = 4 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 384 whisper_model_load: n_text_head = 6 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 9 whisper_model_load: qntvr = 1 whisper_model_load: type = 1 (tiny) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: Xavier, compute capability 7.2, VMM: yes whisper_model_load: CUDA0 total size = 31.58 MB whisper_model_load: model size = 31.57 MB whisper_backend_init_gpu: using CUDA backend whisper_init_state: kv self size = 9.44 MB whisper_init_state: kv cross size = 9.44 MB whisper_init_state: kv pad size = 2.36 MB whisper_init_state: compute buffer (conv) = 13.19 MB whisper_init_state: compute buffer (encode) = 85.53 MB whisper_init_state: compute buffer (cross) = 3.88 MB whisper_init_state: compute buffer (decode) = 97.96 MB

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:10.580] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: load time = 25600.27 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 32.80 ms whisper_print_timings: sample time = 128.63 ms / 131 runs ( 0.98 ms per run) whisper_print_timings: encode time = 1718.51 ms / 1 runs ( 1718.51 ms per run) whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: batchd time = 258.43 ms / 129 runs ( 2.00 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 27770.06 ms

Aug 28 '24 14:08 aoom

whisper.cpp whisper.cpp copied to clipboard

[BUG] When canceling --no-gpu model loading becomes abnormally slow.

whisper.cpp
whisper.cpp copied to clipboard