Weird words repetitions on zh
I've tested this project with English(default model) and it worked as expected, but when I run the same audio with Large model, I encountered RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size error.
But if I switched to another audio in Chinese(using large-V2), it goes without error, but with so much weird words repetitions in the output.
Translating the Chinese below, it's akin to:
Speaker 1: Please don't don't hesitate to like ke and subscribe scribe
Speaker 1: 请不 不吝点 点赞 赞 订阅 阅
Speaker 0: 转发 发 打赏 赏支 支持 持明 明镜 镜与 与点 点点 点栏 栏目 目 不去 去锵锵 锵三人 人行 行了 了 其实 实我 我特 特想 想来 来 几次 次编 编导 导约 约我 我都 都是 是时 时间 间冲 冲突 突嘛 嘛 冲突 突呢 呢 那我 我就 就是 是很 很世 世俗 俗的 的认 认为 为利 利益 益最 最大
Update: Tested on Japanese and get the same result as well. Tested on French and works well just like English.
I've been getting the same error with different languages (including English actually). Any help is much appreciated
same error :(
I got the same error when using this https://github.com/ggerganov/whisper.cpp Do you get the same results using the official Whisper from OpenAI? I decided to revert back to that because got too inaccurate results using anything other than that.
When Ive played with whisper large model, different languages works as expected. When I use WhisperX or other implementation os speed ones - I have issues with languages different than english.