Whisper icon indicating copy to clipboard operation
Whisper copied to clipboard

Original whisper cpp now as fast as const-me version

Open emcodem opened this issue 1 year ago • 8 comments

Recently the "Mother" project https://github.com/ggerganov/whisper.cpp introduced native nvidia support in their cublas package. https://github.com/ggerganov/whisper.cpp/releases (search for cublas)

In my quick first tests, this is now exactly the same speed as Const-me version is, using large model on A5000 Card, both take 19 seconds on a 4 minute input wav (we must use -bs 1 option in the whispercpp version to compare correctly).

This of course means that i will not invest anymore efforts into the Const-me version anymore because the other project is heavily maintained while the Const-me version is effectively dead. Anyways, thanks to @Const-me for all the efforts, i am sure your code helped the guys from the whisper.cpp team a lot to implement the native nvidia support!

emcodem avatar Nov 25 '23 23:11 emcodem

so this doesn't work on AMD/Intel GPUs? well that can't be good!

edit: i see a ggml-opencl.cpp so maybe it's ok.

tigros avatar Nov 26 '23 08:11 tigros

Recently the "Mother" project https://github.com/ggerganov/whisper.cpp introduced native nvidia support in their cublas package. https://github.com/ggerganov/whisper.cpp/releases (search for cublas)

In my quick first tests, this is now exactly the same speed as Const-me version is, using large model on A5000 Card, both take 19 seconds on a 4 minute input wav (we must use -bs 1 option in the whispercpp version to compare correctly).

This of course means that i will not invest anymore efforts into the Const-me version anymore because the other project is heavily maintained while the Const-me version is effectively dead. Anyways, thanks to @Const-me for all the efforts, i am sure your code helped the guys from the whisper.cpp team a lot to implement the native nvidia support!

Does it have the same problem with repetition, or has that been solved? If not, do you have any solution for that?

eagleftw023 avatar Nov 26 '23 17:11 eagleftw023

When I go to a page like https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.1, I have no idea what to download. I searched for "cublas" and there was no such thing on the page.

RickArcher108 avatar Nov 26 '23 18:11 RickArcher108

The original whisper cpp does not work on my gtx970. At the same time Const-me version works fine through DirectX without any problems. The only problem that came out with the Const-me version is the missing support for the large V3 model.

1001ruchka avatar Nov 27 '23 02:11 1001ruchka

@eagleftw023 all known to me projects underly the repeat forever issue curently as far as i know. The solution to repeat forever issues is here: https://github.com/Const-me/Whisper/issues/26#issuecomment-1664575228 The same concept (reset prompt history if output repeats or is forever lowercase) applies to all other projects but that also means that you need to modify the source code of each project to apply the same concept. I don't work with V3 as it repeats more than older versions.

emcodem avatar Nov 27 '23 19:11 emcodem

I won't be using "Mother's" project because I can't use c++, and I'd like him to package a piece of software like "Const-me".

peaceful610 avatar Dec 06 '23 16:12 peaceful610

In my quick first tests, this is now exactly the same speed as Const-me version is, using large model on A5000 Card, both take 19 seconds on a 4 minute input wav (we must use -bs 1 option in the whispercpp version to compare correctly).

I can't get such results, my results on NVidia 1660 on same 46 seconds audio file:

Whisper.CPP (v1.5.4/X64): 3 m 21 s Whisper.CPP (v1.5.4/whisper-cublas-12.2.0-bin-x64.zip) with no "-bs 1" option: 2 m 42 s Whisper.CPP (v1.5.4/whisper-cublas-12.2.0-bin-x64.zip) with "-bs 1" option: 2 m 00 s Const-me Whisper: 0 m 17 s

Seems like Whisper.CPP version recognize GPU, but it doesn't affect results too much

15:23:05     chcp 1251 & C:\Users\User\Desktop\WW\Engines\WhisperCPP\main.exe -f "C:\Users\User\Desktop\WW\file_427-16bit.wav" -m "C:\Users\User\Desktop\WW\Models\ggml-large-v2.bin" -mc 64 -l uz -osrt -bs 1
15:25:05     Текущая кодовая страница: 1251
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\User\Desktop\WW\Models\ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660 SUPER, compute capability 7.5, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  3094.49 MB
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   30.98 MB
whisper_init_state: compute buffer (encode) =  212.42 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =   99.23 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing 'C:\Users\User\Desktop\WW\file_427-16bit.wav' (693120 samples, 43.3 sec), 4 threads, 1 processors, 1 beams + best of 5, lang = uz, task = transcribe, timestamps = 1 ...

whisper_print_timings:     load time =  2993.57 ms
whisper_print_timings:     fallbacks =   5 p /  11 h
whisper_print_timings:      mel time =    40.94 ms
whisper_print_timings:   sample time =  1589.29 ms /     1 runs ( 1589.29 ms per run)
whisper_print_timings:   encode time = 14452.77 ms /     2 runs ( 7226.39 ms per run)
whisper_print_timings:   decode time = 39084.01 ms /   317 runs (  123.29 ms per run)
whisper_print_timings:   batchd time = 58606.87 ms /  2279 runs (   25.72 ms per run)
whisper_print_timings:   prompt time =  2388.84 ms /   204 runs (   11.71 ms per run)
whisper_print_timings:    total time = 119170.83 ms


Express2000 avatar Feb 19 '24 11:02 Express2000

I'd like to say that the original whisper.cpp program is inferior to the Whisper Const-me version because it relies on the command line interface instead of a GUI. Besides, when I tried to use the original program in the past on Windows 10, I had to install all kinds of additional programs like Windows Subsystem for Linux, FFMPEG and add it to the PATH variable. It was quite a tricky task for an ordinary translator.

This is why I am deeply grateful to @Const-me for making this program available to the general public. It is easy to use, the UI is understandable, it can process all kinds of video and audio files without me having to convert them to wav first.

The only thing I'd love to improve is hiccups and hallucinations on difficult files. But I think it is a problem that all programs of this kind suffer from. It could be mitigated by adding an option to start transcribing from a specific time of the audio file. And adding support for ggml-large-v3.bin would be great. But I can't complain about what I already have.

Yanngo avatar Mar 18 '24 09:03 Yanngo