Whisper icon indicating copy to clipboard operation
Whisper copied to clipboard

transcribe fails occasionally when trying to translate from other languages to english

Open jaredanson opened this issue 1 year ago • 11 comments

when i click transcribe i will occasionally immediately get a pop up saying it failed. it says transcribe failed, no suitable transform was found to encode or decode the content.

i'm on windows 11 with an intel arc gpu, but if i try using intel integrated graphics i get the same popup. i'm using the ggml-medium.bin model from hugging face.

~transcribing a video in english works just fine, i only encounter this problem sometimes when using the translate feature.~

edit: it also happens with English videos, i just by chance only tried English videos not using Opus audio codec, so it made it seem like the translate feature was the issue.

when it fails, the debug console just says Created source reader from the file "M:\files\some-file.mp4" and there will be an empty text file.

jaredanson avatar Jun 08 '23 10:06 jaredanson

i figured out the issue, the program is failing if a video is using Opus audio codec. If i transcode the audio to aac or something else, then the app is able to transcribe the video.

is not supporting Opus a known limitation? it is an open source codec used in most youtube videos.

jaredanson avatar Jun 08 '23 10:06 jaredanson

It specifically says in the readme that it doesn't support Opus though I have no idea why, I guess maybe because Media Foundation doesn't support it:

Media Foundation for audio handling, supports most audio and video formats (with the notable exception of Ogg Vorbis), and most audio capture devices which work on Windows (except some professional ones, which only implementing ASIO API).

It's been suggested that ffmpeg should be used to allow more formats:

https://github.com/Const-me/Whisper/issues/84

albino1 avatar Jun 08 '23 16:06 albino1

you can try Whisperer which uses ffmpeg.

tigros avatar Jun 08 '23 21:06 tigros

This model path is correct:

image

Any idea why I’m getting this “not found” message?

Rick Archer Buddha at the Gas Pump https://batgap.com @.***

From: tigros @.> Sent: Thursday, June 8, 2023 4:22 PM To: Const-me/Whisper @.> Cc: Subscribed @.***> Subject: Re: [Const-me/Whisper] transcribe fails occasionally when trying to translate from other languages to english (Issue #117)

you can try Whispererhttps://github.com/tigros/Whisperer which uses ffmpeg.

— Reply to this email directly, view it on GitHubhttps://github.com/Const-me/Whisper/issues/117#issuecomment-1583360269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7JTFIJ3RJ75ODHDC3QKWLDXKI7ABANCNFSM6AAAAAAY7DC4CA. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>

RickArcher108 avatar Jun 08 '23 21:06 RickArcher108

no quotes please.

tigros avatar Jun 08 '23 22:06 tigros

For me, Whisperer is just generating a .wav file, not a .vtt file like Whisper does. What am I doing wrong?

RickArcher108 avatar Jun 09 '23 01:06 RickArcher108

if the Go button reads Cancel it's not done yet. you'll get .srt not .vtt

also you don't really need to use large model, base or small will be much faster.

tigros avatar Jun 09 '23 01:06 tigros

It seems to have gotten stuck, as I started it hours ago, and with Whisper, it usually takes only 15 or 20 minutes to process a 2-hour interview.

RickArcher108 avatar Jun 09 '23 02:06 RickArcher108

hmm it could be stuck if your GPU doesn't have much memory, i should fix that. but try base or small model.

tigros avatar Jun 09 '23 02:06 tigros

My GPU has 12gb, and the large model works fine in Whisper. I prefer accuracy over speed. Maybe I should stick with Whisper.

RickArcher108 avatar Jun 09 '23 02:06 RickArcher108

try taskmgr see if it's running. it is using Const whisper.

actually with 12 GB it's running both simultaneously so that's why it's slower.

tigros avatar Jun 09 '23 02:06 tigros