faster-whisper Turbo-V3

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Oct 01 '24 02:10 freddierice

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

Oct 01 '24 09:10 anhnh2002

Thanks for the quick conversion! I'm getting a tokenizer error:

Traceback (most recent call last):
  File "transcribe.py", line 660, in __init__
    self.hf_tokenizer = tokenizers.Tokenizer.from_file(tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 264861 column 3

Any support would be appreciated :) EDIT: The link below ](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2) works fine, so thank you!

Oct 01 '24 09:10 tjongsma

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper. https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

It's done in: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

Oct 01 '24 10:10 asr-lord

Tested https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 Works very fast

Oct 01 '24 10:10 Nik-Kras

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

Oct 01 '24 10:10 stevevaius2015

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:

from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

Oct 01 '24 10:10 asr-lord

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

Oct 01 '24 10:10 Nik-Kras

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:
from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

Thanks!

Oct 01 '24 10:10 stevevaius2015

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

Oct 01 '24 11:10 milsun

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

Have you tried faster-whisper? It'seems that it's faster than any other framework. https://medium.com/@GenerationAI/streaming-with-whisper-in-mlx-vs-faster-whisper-vs-insanely-fast-whisper-37cebcfc4d27

You could try: https://github.com/mustafaaljadery/lightning-whisper-mlx

Oct 01 '24 11:10 asr-lord

lmao the cantonese model is not word to word in the large-v3-turbo one... so sad... :( still will use the large-v3 💖 好嘅 -> 好的是否 -> 係咪 meanings are maintained .. but come on

Oct 01 '24 13:10 hockyy

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

Oct 01 '24 13:10 Nik-Kras

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue Used this model: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 vs the normal large-v3 model

Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue

on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it

藏哥係咪未讀過書? 床哥是否未讀過書?

是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

Oct 01 '24 14:10 hockyy

We now support the new whisper-large-v3-turbo on Sieve!

Use it via sieve/speech_transcriber: https://www.sievedata.com/functions/sieve/speech_transcriber Use sieve/whisper directly: https://www.sievedata.com/functions/sieve/whisper

Just set speed_boost to True. API guide is under "Usage Guide" tab.

Oct 01 '24 17:10 mvoodarla

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

Oct 01 '24 19:10 Jiltseb

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

I think that would be necessary for lots of downstream projects like faster-whisper-server

Oct 01 '24 19:10 thiswillbeyourgithub

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue Used this model: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 vs the normal large-v3 model
Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue
on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it

藏哥係咪未讀過書? 床哥是否未讀過書?

是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

You may find this discussion helpful: https://github.com/openai/whisper/discussions/2363#discussion-7264254 "Across languages, the turbo model performs similarly to large-v2, though it shows larger degradation on some languages like Thai and Cantonese."

Oct 02 '24 07:10 asr-lord

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see https://github.com/SYSTRAN/faster-whisper/issues/1030#issuecomment-2393164779

Oct 04 '24 08:10 tjongsma

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

Oct 04 '24 08:10 Nik-Kras

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

Oct 04 '24 09:10 asr-lord

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

Right, HQQ works with Transformers. But faster-whisper is just whisper accelerated with CTranslate2 and there are models of turbo accelerated with CT2 available on HuggingFace: deepdml/faster-whisper-large-v3-turbo-ct2

Also, HQQ is integrated in Transformers, so quantization should be as easy as passing an argument

model_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
quant_config = HqqConfig(nbits=4, group_size=64)

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    use_safetensors=True,
    quantization_config=quant_config
)

https://huggingface.co/docs/transformers/main/quantization/hqq

I didn't try it yet, so don't know if that is going to work. How about to have a chat about it outside of the GitHub issue? Send me a message on LinkedIn, I have it attached in the profile

Oct 04 '24 09:10 Nik-Kras

No, There is no HQQ support for ctranslate2 yet.

Faster-whisper has whisper models in Ctranslate2 format, which is different from pytorch models in HF. Of course the models are available in HF so that it is easy to use in packages such as faster-whisper. But one can not directly load a ctranslate2 checkpoint with AutoModelForSpeechSeq2Seq.

I have created a feature request in the past to support HQQ (with static cache and torch compilation): https://github.com/OpenNMT/CTranslate2/issues/1717

The PR is still in progress and it has some performance issues that needs to be fixed.

Oct 04 '24 09:10 Jiltseb

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

No standardized evaluation or anything, I'm just running it in my streaming application and seeing way worse results than medium (especially with it randomly just not transcribing part of the text). This is a ctranslate2 implementation, deepdml/faster-whisper-large-v3-turbo-ct2 to be exact. See my linked comment for the code

Oct 04 '24 12:10 tjongsma

were you able to convert turbo to faster-whisper format?

Oct 07 '24 17:10 silvacarl2

Mobiuslabs fork now supports turbo out of the box, and has additional fixes.

Oct 09 '24 15:10 Jiltseb

Just to mention that I added support in https://github.com/Softcatala/whisper-ctranslate2 for anybody that wants to test the turbo-model with the current with the current faster-whisper version.

Oct 10 '24 13:10 jordimas

There seems to be a lot of confusion in this thread -- if you want to use turbo with the current faster whisper, all you have to do is

WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16")

Closing this thread since there is no issue.

Oct 10 '24 14:10 freddierice

faster-whisper faster-whisper copied to clipboard

Turbo-V3

faster-whisper
faster-whisper copied to clipboard