Realtime use possible?
There are some whisper realtime libraries out there. Is there any possible way to make this library realtime ?
Yes. It is possible. We are working on capturing audio from mic and transcribe in realtime.
Yes. It is possible. We are working on capturing audio from mic and transcribe in realtime.
Like webGPU? It's awesome if I got it right
Any updates?
we've already used vosk model for real-time process but with it's low accuracy, we highly need the performance of whisper's audio transcription in real-time, but the inference time is taking so long(about 3seconds average). is there any solution we can use?
Any updates?
up
use whisper-v3-turbo model.Its 3x faster then others also make sure you enable cuda or OpenCL on AMD
use whisper-v3-turbo model.Its 3x faster then others also make sure you enable cuda or OpenCL on AMD
thanks but I searched and found large-v3-turbo, but I didn't found a tiny version of turbo:( i need tiny model to run the inference model on mobile devices
Yes. It is possible. We are working on capturing audio from mic and transcribe in realtime.
any updates?
Well , is there any way to prevent and ignore some other languages like arabic , indian etc... and load only 1 language model so we can optimize it further ? there is some option on struct but this doesnt optimize the model.
Well , is there any way to prevent and ignore some other languages like arabic , indian etc... and load only 1 language model so we can optimize it further ? there is some option on struct but this doesnt optimize the model.
it already does so. this repository already have two tflite models for both English and multilingual models but I didn't notice much of difference
english only.it should be language selection.If i want hebrew , i should do it.
@salehsoleimani , did you tried 8 bit models ? https://huggingface.co/ggerganov/whisper.cpp like : large-v3-turbo-q8_0
@salehsoleimani , did you tried 8 bit models ? https://huggingface.co/ggerganov/whisper.cpp like : large-v3-turbo-q8_0
i guess currently used model is a quantized version of whisper. isn't that so? @vilassn
no , default model uses float32 or bfloat32 afaik also some more quantized models https://huggingface.co/ctranslate2-4you/distil-whisper-small.en-ct2-bfloat16
no , default model uses float32 or bfloat32 afaik also some more quantized models https://huggingface.co/ctranslate2-4you/distil-whisper-small.en-ct2-bfloat16
oh thanks. i'm gonna try the 8/5-bit models do you know that are these models' tflite is available or not?
afaik , you can convert to ggml and its also good ?
Converting whisper fine-tuning models to ggml
python3 ./models/convert-h5-to-ggml.py whisper-large-v2-japanese-5k-steps whisper outputs
@salehsoleimani did you check for 8 bit models ? any news?
@salehsoleimani did you check for 8 bit models ? any news?
no, actually I've been tolled quantization doesn't affect on speed too much
@salehsoleimani try https://huggingface.co/distil-whisper/distil-large-v3 distil whisper is 6x faster then regular v3. Can you try to use distil whisper ?
also try here and return back please https://github.com/huggingface/distil-whisper they claim code will run 6x faster then whisper v3 Also can someone tell me how to convert these to whisper compatible format ? gguf ? ggml ? i dont know here
also try here and return back please https://github.com/huggingface/distil-whisper they claim code will run 6x faster then whisper v3
sure thanks i'm gonna try it out
also try here and return back please https://github.com/huggingface/distil-whisper they claim code will run 6x faster then whisper v3 Also can someone tell me how to convert these to whisper compatible format ? gguf ? ggml ? i dont know here
I don't think that's gonna work out. whisper tiny (used in this repository) has 39M parameters. the distil-whisper starts from whisper-small unfortunately which has 166M params(compared to whisper-small which is 244M params it's faster) but compared to whisper-tiny it's not
@vilassn https://github.com/vilassn/whisper_android/issues/1#issuecomment-1744233582 were you successful with this? I see that you implemented some VAD code in the repo (vad.cpp), but I don't see where's it's being used anywhere. I tried to clone the repo too and didn't see any real time transcription in the sample
@salehsoleimani unfortunately I'm trying to use it for a free app for the Deaf: "BeAware Deaf Assistant", so I need to figure out how to do it the hard way. Do you have a video of it working well for you?
@salehsoleimani unfortunately I'm trying to use it for a free app for the Deaf: "BeAware Deaf Assistant", so I need to figure out how to do it the hard way. Do you have a video of it working well for you?
sorry I was kidding. I couldn't figure out how to use real-time whisper, but we could manage it with vosk/kaladi, which gives real-time captions with good speed but medium accuracy. Have you checked it out?
No worries, @salehsoleimani vosk was low accuracy when I tried their android demo from here: https://github.com/alphacep/vosk-android-demo More over it doesn't do punctuation marks (. , ! ? "") at all compared to whisper & its models, along with correct sentence end and start
No worries, @salehsoleimani vosk was low accuracy when I tried their android demo from here: https://github.com/alphacep/vosk-android-demo More over it doesn't do punctuation marks (. , ! ? "") at all compared to whisper & its models, along with correct sentence end and start
yeah it's because of how whisper works. processes sentence by sentence each starting with end of a sentence and ending with a period or end of sentence which makes it perfect but unfortunately I couldn't make any result to run it real time on mobile :( I'm sorry
try out : https://github.com/eix128/WhisperJET this library is best for java including android
try out : https://github.com/eix128/WhisperJET this library is best for java including android
oh yeah! :) thanks a lot I'm gonna try this out