whisper.cpp Huggingface Models Converted Using this pipeline is too slow

Question 1: This is a whisper medium model finetuned in Nepali language. The inference for audio of length 39 second takes forever(13 mins). Is there any issues with the ggml conversion? @ggerganov The same audio takes 70 seconds with medium.en model.

Question 2: The transcription output is of 30 second chunk, how to make it dynamic like the ggml-medium model?

Jan 24 '23 15:01 spygaurad

Given these results, I believe the fine-tuned model does not output timestamp tokens for some reason. To confirm that, can you provide the output of the same run after adding -ps command line argument to make the tool print the special tokens in the output?

Jan 25 '23 17:01 ggerganov

@ggerganov This is the script with -ps. One think i noticed is that the runs in my model is 5692 whereas in medium.en model, it is around 92.

I tried inference without timestamp option( -nt ), still taking too long.

Jan 25 '23 19:01 spygaurad

I see the transcribe (50359) token is being decoded a lot of times for some reason. This is not supposed to happen. I just pushed a change to master to suppress the task tokens. Not sure if it would help, but you might want to give it another try.

Feb 04 '23 07:02 ggerganov

I pulled the master , haven't noticed any performance change.

Feb 06 '23 10:02 spygaurad

We still see the 50359 token - this is unexpected. I guess best option is to provide instructions for downloading the model so I can test it locally.

Feb 11 '23 07:02 ggerganov

have same problem here.
after convert my fine-tuned model, it take low time on decode time:

whisper_print_timings:     load time =    73.56 ms
whisper_print_timings:     fallbacks =   2 p /   1 h
whisper_print_timings:      mel time =    33.36 ms
whisper_print_timings:   sample time =   928.93 ms /  1907 runs (    0.49 ms per run)
whisper_print_timings:   encode time =   129.43 ms /     1 runs (  129.43 ms per run)
whisper_print_timings:   decode time =  2592.87 ms /  1899 runs (    1.37 ms per run)
whisper_print_timings:    total time =  3770.88 ms

Aug 21 '23 02:08 haozes

add: -nf, --no-fallback [false ] do not use temperature fallback while decoding now it works

whisper_print_timings:     load time =    62.46 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    34.00 ms
whisper_print_timings:   sample time =     3.18 ms /     8 runs (    0.40 ms per run)
whisper_print_timings:   encode time =   173.80 ms /     1 runs (  173.80 ms per run)
whisper_print_timings:   decode time =    11.11 ms /     8 runs (    1.39 ms per run)
whisper_print_timings:    total time =   296.14 ms

Aug 21 '23 02:08 haozes

whisper.cpp whisper.cpp copied to clipboard

Huggingface Models Converted Using this pipeline is too slow

whisper.cpp
whisper.cpp copied to clipboard