Saleh Soleimani
Saleh Soleimani
we've already used vosk model for real-time process but with it's low accuracy, we highly need the performance of whisper's audio transcription in real-time, but the inference time is taking...
> Any updates? up
> use whisper-v3-turbo model.Its 3x faster then others also make sure you enable cuda or OpenCL on AMD thanks but I searched and found large-v3-turbo, but I didn't found a...
> Yes. It is possible. We are working on capturing audio from mic and transcribe in realtime. any updates?
> Well , is there any way to prevent and ignore some other languages like arabic , indian etc... and load only 1 language model so we can optimize it...
> @salehsoleimani , did you tried 8 bit models ? https://huggingface.co/ggerganov/whisper.cpp like : large-v3-turbo-q8_0 i guess currently used model is a quantized version of whisper. isn't that so? @vilassn
> no , default model uses float32 or bfloat32 afaik also some more quantized models https://huggingface.co/ctranslate2-4you/distil-whisper-small.en-ct2-bfloat16 oh thanks. i'm gonna try the 8/5-bit models do you know that are these...
> @salehsoleimani did you check for 8 bit models ? any news? no, actually I've been tolled quantization doesn't affect on speed too much
> also try here and return back please https://github.com/huggingface/distil-whisper they claim code will run 6x faster then whisper v3 sure thanks i'm gonna try it out
> also try here and return back please https://github.com/huggingface/distil-whisper they claim code will run 6x faster then whisper v3 Also can someone tell me how to convert these to whisper...