whisper-diarization icon indicating copy to clipboard operation
whisper-diarization copied to clipboard

Weird words repetitions on zh

Open terryops opened this issue 2 years ago • 4 comments

I've tested this project with English(default model) and it worked as expected, but when I run the same audio with Large model, I encountered RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size error. But if I switched to another audio in Chinese(using large-V2), it goes without error, but with so much weird words repetitions in the output. Translating the Chinese below, it's akin to:

Speaker 1: Please don't don't hesitate to like ke and subscribe scribe

Speaker 1: 请不 不吝点 点赞 赞 订阅 阅

Speaker 0: 转发 发 打赏 赏支 支持 持明 明镜 镜与 与点 点点 点栏 栏目 目 不去 去锵锵 锵三人 人行 行了 了 其实 实我 我特 特想 想来 来 几次 次编 编导 导约 约我 我都 都是 是时 时间 间冲 冲突 突嘛 嘛 冲突 突呢 呢 那我 我就 就是 是很 很世 世俗 俗的 的认 认为 为利 利益 益最 最大

Update: Tested on Japanese and get the same result as well. Tested on French and works well just like English.

terryops avatar Sep 14 '23 05:09 terryops

I've been getting the same error with different languages (including English actually). Any help is much appreciated

Toby1091 avatar Sep 17 '23 22:09 Toby1091

same error :(

XinyuZhou2000 avatar Sep 25 '23 09:09 XinyuZhou2000

I got the same error when using this https://github.com/ggerganov/whisper.cpp Do you get the same results using the official Whisper from OpenAI? I decided to revert back to that because got too inaccurate results using anything other than that.

AlbinGyllander avatar Sep 25 '23 12:09 AlbinGyllander

When Ive played with whisper large model, different languages works as expected. When I use WhisperX or other implementation os speed ones - I have issues with languages different than english.

PiotrEsse avatar Dec 04 '23 20:12 PiotrEsse