whisper-standalone-win icon indicating copy to clipboard operation
whisper-standalone-win copied to clipboard

Whisper-turbo model support

Open codeMonkey-shin opened this issue 1 year ago • 8 comments

https://github.com/openai/whisper/pull/2361/files

Will this model be supported in the future?

codeMonkey-shin avatar Oct 01 '24 06:10 codeMonkey-shin

Definitely we need this.

stevevaius2015 avatar Oct 02 '24 06:10 stevevaius2015

I know large-v3 causes some weird stuff in transcription compared to large-v2, I wonder if they improved upon that while also making it faster.

andyhoeung avatar Oct 02 '24 15:10 andyhoeung

+1 need turbo

wangfeng35 avatar Oct 04 '24 02:10 wangfeng35

+1 need turbo

mp3pintyo avatar Oct 04 '24 09:10 mp3pintyo

+1 need turbo

AlanHuang99 avatar Oct 05 '24 01:10 AlanHuang99

Downloaded the files for turbo manually and replacing an existing model seems to work as a workaround

https://huggingface.co/openai/whisper-large-v3-turbo/tree/main

LoggeL avatar Oct 05 '24 09:10 LoggeL

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

andyhoeung avatar Oct 05 '24 13:10 andyhoeung

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

--model=large-v3-turbo

` Warning: 'large-v3' model may produce inferior results, try 'large-v2'!

Traceback (most recent call last): File "D:\whisper-fast_XXL_main_.py", line 1668, in File "D:\whisper-fast_XXL_main_.py", line 1595, in cli File "faster_whisper\transcribe.py", line 1456, in restore_speech_timestamps File "faster_whisper\transcribe.py", line 798, in generate_segments File "faster_whisper\transcribe.py", line 1109, in encode ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead [15868] Failed to execute script 'main' due to unhandled exception! `

faster-whisper-xxl.exe

juntaosun avatar Oct 10 '24 04:10 juntaosun

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

--model=large-v3-turbo

` Warning: 'large-v3' model may produce inferior results, try 'large-v2'!

Traceback (most recent call last): File "D:\whisper-fast_XXL__main__.py", line 1668, in File "D:\whisper-fast_XXL__main__.py", line 1595, in cli File "faster_whisper\transcribe.py", line 1456, in restore_speech_timestamps File "faster_whisper\transcribe.py", line 798, in generate_segments File "faster_whisper\transcribe.py", line 1109, in encode ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead [15868] Failed to execute script 'main' due to unhandled exception! `

faster-whisper-xxl.exe

Same error here... any fix since then ?

ZeVince avatar Nov 03 '24 15:11 ZeVince

Same error here... any fix since then ?

Not yet, but will be sooner than later, just that now some other things have priority.

Purfview avatar Nov 03 '24 15:11 Purfview

Will this model be supported in the future?

It was always supported, as any other custom finetuned model.

Autodownload for it is added in v193.1

Purfview avatar Nov 06 '24 19:11 Purfview

Is turbo model supposed to do any translation at all? Produces untranslated German text with --task translate, whereas vanilla large-v3 appears to work fine.

nebehr avatar Nov 12 '24 13:11 nebehr

"Whisper turbo was fine-tuned for two more epochs over the same amount of multilingual transcription data used for training large-v3, i.e. excluding translation data, on which we don’t expect turbo to perform well."

Purfview avatar Nov 12 '24 14:11 Purfview

I see. From this description though I would expect it to be bad translation, not no translation at all. Anyway, this is beyond the scope of this project.

nebehr avatar Nov 12 '24 15:11 nebehr