faster-whisper
faster-whisper copied to clipboard
Turbo-V3
I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.
https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md
I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.
https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md
Could you convert Whisper Turbo with the multilingual tokenizer?
Thanks for the quick conversion! I'm getting a tokenizer error:
Traceback (most recent call last):
File "transcribe.py", line 660, in __init__
self.hf_tokenizer = tokenizers.Tokenizer.from_file(tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 264861 column 3
Any support would be appreciated :) EDIT: The link below ](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2) works fine, so thank you!
I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper. https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md
Could you convert Whisper Turbo with the multilingual tokenizer?
It's done in: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
Tested https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 Works very fast
Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper
Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when
pip install faster-whisper
You've to download the model in your local:
from huggingface_hub import snapshot_download
repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")
If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"
https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio
Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when
pip install faster-whisperYou've to download the model in your local:
from huggingface_hub import snapshot_download repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2" local_dir = "faster-whisper-large-v3-turbo-ct2" snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")
Thanks!
any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.
any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.
Have you tried faster-whisper? It'seems that it's faster than any other framework. https://medium.com/@GenerationAI/streaming-with-whisper-in-mlx-vs-faster-whisper-vs-insanely-fast-whisper-37cebcfc4d27
You could try: https://github.com/mustafaaljadery/lightning-whisper-mlx
lmao the cantonese model is not word to word in the large-v3-turbo one... so sad... :( still will use the large-v3 💖 好嘅 -> 好的 是否 -> 係咪 meanings are maintained .. but come on
but come on
I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?
but come on
I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?
--language=yue
Used this model:
https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
vs the normal large-v3 model
Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue
on https://www.youtube.com/watch?v=sgRfqRFJlAg
so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it
藏哥係咪未讀過書? 床哥是否未讀過書?
是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)
In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%
bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"
We now support the new whisper-large-v3-turbo on Sieve!
Use it via sieve/speech_transcriber: https://www.sievedata.com/functions/sieve/speech_transcriber
Use sieve/whisper directly: https://www.sievedata.com/functions/sieve/whisper
Just set speed_boost to True. API guide is under "Usage Guide" tab.
@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?
@trungkienbkhn Will
SYSTRANbe adding this in the HF repo to be used out of the box?
I think that would be necessary for lots of downstream projects like faster-whisper-server
but come on
I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?
--language=yueUsed this model: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 vs the normal large-v3 modelFaster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yueon
https://www.youtube.com/watch?v=sgRfqRFJlAgso cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it
![]()
藏哥係咪未讀過書? 床哥是否未讀過書?
是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)
In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%
bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"
You may find this discussion helpful:
https://github.com/openai/whisper/discussions/2363#discussion-7264254
"Across languages, the turbo model performs similarly to large-v2, though it shows larger degradation on some languages like Thai and Cantonese."
If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"
https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio
What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see https://github.com/SYSTRAN/faster-whisper/issues/1030#issuecomment-2393164779
What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)
I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.
How bad are the evaluation results?
What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)
I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.
How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-
What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)
I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.
How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-
Right, HQQ works with Transformers. But faster-whisper is just whisper accelerated with CTranslate2 and there are models of turbo accelerated with CT2 available on HuggingFace: deepdml/faster-whisper-large-v3-turbo-ct2
Also, HQQ is integrated in Transformers, so quantization should be as easy as passing an argument
model_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
quant_config = HqqConfig(nbits=4, group_size=64)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
quantization_config=quant_config
)
https://huggingface.co/docs/transformers/main/quantization/hqq
I didn't try it yet, so don't know if that is going to work. How about to have a chat about it outside of the GitHub issue? Send me a message on LinkedIn, I have it attached in the profile
No, There is no HQQ support for ctranslate2 yet.
Faster-whisper has whisper models in Ctranslate2 format, which is different from pytorch models in HF. Of course the models are available in HF so that it is easy to use in packages such as faster-whisper. But one can not directly load a ctranslate2 checkpoint with AutoModelForSpeechSeq2Seq.
I have created a feature request in the past to support HQQ (with static cache and torch compilation): https://github.com/OpenNMT/CTranslate2/issues/1717
The PR is still in progress and it has some performance issues that needs to be fixed.
What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)
I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.
How bad are the evaluation results?
No standardized evaluation or anything, I'm just running it in my streaming application and seeing way worse results than medium (especially with it randomly just not transcribing part of the text). This is a ctranslate2 implementation, deepdml/faster-whisper-large-v3-turbo-ct2 to be exact. See my linked comment for the code
were you able to convert turbo to faster-whisper format?
Mobiuslabs fork now supports turbo out of the box, and has additional fixes.
Just to mention that I added support in https://github.com/Softcatala/whisper-ctranslate2 for anybody that wants to test the turbo-model with the current with the current faster-whisper version.
There seems to be a lot of confusion in this thread -- if you want to use turbo with the current faster whisper, all you have to do is
WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16")
Closing this thread since there is no issue.
