Whisper
Whisper copied to clipboard
transcribe fails occasionally when trying to translate from other languages to english
when i click transcribe i will occasionally immediately get a pop up saying it failed. it says transcribe failed, no suitable transform was found to encode or decode the content.
i'm on windows 11 with an intel arc gpu, but if i try using intel integrated graphics i get the same popup. i'm using the ggml-medium.bin model from hugging face.
~transcribing a video in english works just fine, i only encounter this problem sometimes when using the translate feature.~
edit: it also happens with English videos, i just by chance only tried English videos not using Opus
audio codec, so it made it seem like the translate feature was the issue.
when it fails, the debug console just says Created source reader from the file "M:\files\some-file.mp4"
and there will be an empty text file.
i figured out the issue, the program is failing if a video is using Opus
audio codec. If i transcode the audio to aac
or something else, then the app is able to transcribe the video.
is not supporting Opus
a known limitation? it is an open source codec used in most youtube videos.
It specifically says in the readme that it doesn't support Opus though I have no idea why, I guess maybe because Media Foundation doesn't support it:
Media Foundation for audio handling, supports most audio and video formats (with the notable exception of Ogg Vorbis), and most audio capture devices which work on Windows (except some professional ones, which only implementing ASIO API).
It's been suggested that ffmpeg should be used to allow more formats:
https://github.com/Const-me/Whisper/issues/84
you can try Whisperer which uses ffmpeg.
This model path is correct:
Any idea why I’m getting this “not found” message?
Rick Archer Buddha at the Gas Pump https://batgap.com @.***
From: tigros @.> Sent: Thursday, June 8, 2023 4:22 PM To: Const-me/Whisper @.> Cc: Subscribed @.***> Subject: Re: [Const-me/Whisper] transcribe fails occasionally when trying to translate from other languages to english (Issue #117)
you can try Whispererhttps://github.com/tigros/Whisperer which uses ffmpeg.
— Reply to this email directly, view it on GitHubhttps://github.com/Const-me/Whisper/issues/117#issuecomment-1583360269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7JTFIJ3RJ75ODHDC3QKWLDXKI7ABANCNFSM6AAAAAAY7DC4CA. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>
no quotes please.
For me, Whisperer is just generating a .wav file, not a .vtt file like Whisper does. What am I doing wrong?
if the Go button reads Cancel it's not done yet. you'll get .srt not .vtt
also you don't really need to use large model, base or small will be much faster.
It seems to have gotten stuck, as I started it hours ago, and with Whisper, it usually takes only 15 or 20 minutes to process a 2-hour interview.
hmm it could be stuck if your GPU doesn't have much memory, i should fix that. but try base or small model.
My GPU has 12gb, and the large model works fine in Whisper. I prefer accuracy over speed. Maybe I should stick with Whisper.
try taskmgr see if it's running. it is using Const whisper.
actually with 12 GB it's running both simultaneously so that's why it's slower.